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ABSTBACT 

The objective of this thesis vas to colligate the 
various strands of research in the literature of computational 
linguistics that have to do with the computational treatment of 
semantic content so as to encode it into a computerized dictionary. 
In chapter 1 the course of mechanical translation (1917-1960) and 
quantitative linguistics is traced to demonstrate the limitations of 
computational linguistics vithout semantics. Chapter 2 covers 
linguistic research in the 1960 *s, vhich vas essentially an offshoot 
of transformational grammar. In chapter 3, various classification 
schemes are examined as a body of experience from vhich to drav 
conclusions on the constraints to which the construction of a 
computerized dictionary is subject. Chapter U is a synthesis of all 
this data in the form of a model dictionary entry. In chapters 2 and 
3 the approaches to semantics are of tuo types. In one, the semantic 
categories for each dictionary entry uere in the form of unordered 
elements, and the means of applying them in text vas placed vithin 
the realm of grammar* In the other type, syntagmatic relationships 
occurred berveen the encoded components of dictionary definitions 
just as they did between those of utterances in a text. The 
conclusion reached is that the latter type of approach provides 
firmer foundations upon vhich to set up a computerized dictionary^ as 
it shovs hov information is structured in terms of its application in 
text* (Author) 
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ABSTRACT 

The objective was to colligate the various strands of 
research in the literature of coTiputational linguistics that have to 
do with the computational treat^nent of semantic content so as to 
encode it into a computerised dictionary. In chapter 1 the course 
of mechanical translation (19^7-1960) and quantitative linguistics 
is traced to demonstrate the lir.iitations of computational linguistics 
without semantics. Chapter 2 covers linguistic research in the 
1960*s, which was essentially an offshoot of transformational 'grammar. 
In chapter 3f various classification schemes are examined as a body 
of experience from vrtiich to draw conclusions on the constraints to 
which the construction of a computerised dictionary is subject. 
Chapter ^ is a synthesis of all this' data in the form of a model 
dictionary entry. 

1)1 chapters 2 and 3 the approaches to semantics are of two 
types. In one, the semantic categories for each dictionary entry 
were in the form of unordered elements, and the means of applying 
them in text vjas placed vdthin the realm of grammar. In the other 
type, syntnrirnatic relationships occurred betvreen the encoded 
components of dictionary defiiiitions just as they did between those 
of utterances in a text. 

The conclusion reached is that the latter tjrpo of approach 
providers finier foundations upon which to set up a computerised 
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dictionary, as it show5 how infoiniation is structured in terns of its 
application in text. In the model structured entry the components of 
the definitions of a word are tagged according to thoir part of 
speech. The representation of a discourse then is determined by 
mapping out the dictionary definitions of words as they .occur in text. 

The necessity for integrating semantic components into the 
structure of a dictionary was brought out at the sixth annual 
symposium on mechanical translation hold, by the National Research 
Council at Ottawa in April, 1972» Several teams apparently achieved 
some results through the parsing of texts by means of ad hoc 
dictionaries. Since the problems of cross-referencing and of the 
recognition of semantic grouping remain unsolved, the key to success 
even in mechanical trari^Jlation lies in computerised lexicography. It 
seems unlikely that much more progress vri.ll be achieved without it. 
VJiile the solution of these complex problems is beyond the scope of 
this single thesis, a formulation of them is provided as a starting . 
point for further research. 
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INTRODUCTION 



This thesis is in support of tho single approach to 
mechanical translation, fact retrieval and document retrieval, in 
i^ich senaiitic content is organised into a dictionary for cocnputalion 
as opposed to the £5oparate and thus far ad hoc treatments of each. 
The success of non-semantic approaches in other areas of computationa 
linguistics has been demonstrated. In stylostatistics, for example, 
while it is usual for a human researcher to focus on the meaning of 
texts of disputed authorship, it is not mandatory. The idiosyncracie 
of an author's style may be observed in the appearance a eiven nmber 
of times of certain words or morphemes or other marked indices that a 
machine may readily identify. 

In mechanical translation and fact retrieval, too, marked 

indices were soufjht.. Fact retrieval in its most rudimentary form 

consisted of counting words in a text and making a summary of it by 

extracting the most frequent words # This technique \'as improved by 

classifying words according t<j subject area into vrfiat were called 

"notational families** on the hypothesis that the subject area of a 

text would bo revealed by an accumulation of words belonging to one 

area. In mechanical translation such families were named 

idioglossaries and were applied to disambiguate v7ords. Such 

ft 

categorisations of vocabulary were an acknowledgement of the value of 
providinrr an organisation of semantic content as the general 
classifications of the type invented by Dewoy in the last century for 
document retrieval do. 
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Another line of developncnt ronsi::ted of representing 
sonantic content by means of unorcIerGa doscriptors. In docment 
retrieval the special classific^itions vr-3re so constructed in order to 
make provision for jncrc than one organisation of information so as to 
meet a usor*s needs. In the period between 19-^-7 and I960 unordered 
descriptors in the form of concept nunbers were set up in mechanical 
translation to canplenient the idioglossary in disambiguation. These 
numbers, vjhich were based upon statistics that indicated that the 
scanning of one or two words on either side of the ambifjuous one 
would be effective, indicated the selectional restrictions that 
allowed words, or noi*e specifically the meanings of words, like 
"flowering" and "plant" to bo immediate constituents. Disambif^uation 
would take place through the matching of such concept numbers* In the 
1960«s Katz and Fodor arrived at a systeni of markers vihich vjere 
essentially concept nu::ibers factored into semantic catefjorios. 

The main drawback of applying unordered descriptors was 
that from a finite vocabulary of them only a finite number of 
co:nbinations could be produced, whereas in natural language syntag- 
matic structure allowed the creation of p-r^tentially infinitely long 
and many sentences^ Consequently the representation of texts by 
descriptors \:ns produced. In coordinate retrieval they seem to be 
viable only br cause of the limits of a library's holdings and 
therefore, of the number of discourses to bo represented. This same 
principle mircht appear to apply to the above-mentioned maricers on the 
ground that as the body of information contained in a dictionary or 
encyclopedia is finite, so is the nunbo.r of semantic categories 
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necessary. However, the presenc^o of syni.nr:i\tic structure in natural 
language weighs against it. For exaniplc, v.hile categories of the 
type abstry/t and an^ ^;nte minht bo appropri?tte in sane way in the 
dictionary entry for ''frighten", they would not be useful by them- 
selves for detecting, for example, '•sincerity frightens John" ?s 
grammatical and '*John frightens sincerity*' as anomalous. 

The key to a computerised semantics lies in explicit 
paraphrase in the fora of syntagmatically construed elenents. This 
type of paraphrase has been the basis for research at the Cambridge 
Laboratory Research Unit in the 1950*s and in the 1960»5 and 1970»s 
at Stanford too. Vhere it is carefully formulated, the necessity for 
explicitly stating the paradigmatic relationships between dictionary 
entries, such as that of hyponymy, no longer exists. In the latest 
research at Stanford the functions of parts of speech and- descriptors 
have been integrated into viiat are called semantic elements. These 
are the basis for the tj^ps of dictionary structure suggested in this 
thesis, in which Katz and Fodor*s marker tree would be reformulated. 

The ramifications of ei^tablishing such a structure have not 
been investiftated in this thesis. The structures provided for the 
two exaniples in chapter ^ serve only as illustrations and are the end 
product of a survey of the literature in computational lins^uistics, 
beinf: based upon the elimination of various fruitless approaches. 
Such an invcstircation would require the aligr'^ent of the represen- 
tations for computation of several woi^s and consequently an analysis 
of a large omount of data. The foinulation of the semantic content 
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of dictionary entries with sufficient rigour for testing in scale ^ 
therafore, comes vdthin the province of a work more comprehensive 
than this thesis. 
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1 CCMI^JTATIONAL UKGUISTICS WITHOUT SUIANTICS, 19'f7-1960 



Beforo about I96O no attcnpt was mado in comrwtfttional 
linguistics to provide a model of how semantic content may be 
structured for computation. To solve semantic problems recourse was 
had to discrete indices of surface structure. In mechanical 
translation it was some specific word or x^rords in the context that 
was sought to resolve multimeaning. In fact retrieval and in research 
on disputed authorship the statistical analysis of word counts 
replaced the complex analysis of content by humans. 

I, I Ihe Role of Statistics in Ccmputational Linguistics 

1,1.1 It is upon the interpretation of these counts that the 

successful use of statistics in ccxnputational linguistics depends, for 
they may signify facts, for example, either about a given language or 
a particular writer^s style. Thus a letter may recur because it is a - 
literary device as in the ^^ase of allitei*ation, or because it is an 
affix catunon to a group of frequently occurring words. The study of 
these facts belongs to quantitative linguistics, which, to adapt 
Hordan*s^ divisions, may be divided into the folloi«ri.ng three branches^ 
litorary statistics (or stylostatistics), optimal systems of language 
structure and mechanical translation economics. 

1.1.1.1 Since vxriters can choose their own words, the applicability 
of statistics to the first branch, in vrhich authorship questions are 
involved, may appear surprising. But it is only the first few words 
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(sanplos) th^t are insufficiont for £;tritistical procodure. As more 
and more words cone under study, the author's choice of each 
additional word is subject to the rulos of grsjnniar, wliich put 
constraints on the possible variations th&t itir^v occur in the ratios 
between + jccurronces of various wor*ds» The analogy which Herdan^ 
draws between De Saussure^s-^ '*langue-parole** dichotomy and the 
dichotomy between population and sample is appropriate* •'Parole** 
(the individual act of speech), like the sample, is open to individual 
choice, but as the number of acts of speech increases they are 
constrained by "lan^e", a body of fixed conventions* VflLthin the 
limited range of vai*iation pemitted by the language the witer's own . 
preferences for certain words form a statistical pattern* This 
pattern is an attribute of a given style that distinguishes it from 
other styles, 

Ihis quantitative approach has helped to solve such problems 
of literal^ research as the determination of the chronological order, 
of texts although the approach has not been able to show the 
development of an individual author's style, and the identification of 
authors of hitherto anonymous texts* The author of The Equatoriq of 
the Planets , for ex^^mple, has been identified as Chaucer, in part 
because of a characteristic of his style, a high proportion of Romance 
words* 

The problem of disputed authorship was worked upon in detail 
by Mostollor and VJallace, in their effort to determine whether 
Hamilton or Madison wrote the tvjelve Federalist Papers. A statistical 



approach was \ised becauso standard methods of historical research had 
not, in Mostoller and V7allace»s experience, settled the issue, although 
an earlier attempt at using statistics by Mosteller and V/illiams in 
19W had also proved inconclusive when oentence length was tertej as 
a suitable criterion for distinguishing styles. Tiie average length of 
Hamilton's sentences was foimd to be 3^.55 words, almost identical to 
that of Madison's, which was 3^.59 words. In 1959 Mosteller rece d 
a clue to di^^tinctive attributes of style from Adair who discovered 
that Hamilton used the word i^hile where Madison used v/nllst . Since 
authors sonetimes change their usage, Mosteller and Wallace looked 
for more evidence. Some was found in Hamilton »s frequent use of the 
words anon and enough. Frequency counts of these words in the 
disputed papers pointed to Madison as their author. 

Since Madison could have merely edited theni other marker 
words v/ere sought to corroborate the above finding. Hamilton was 
found to use the words bjr and frcm less often than Madison but to 
more often. Since all these were function words, it was very 
unlikely that the frequency counts were due to the content of the 
papers. Mosteller and Wallace concluded from the additional 
statistical evidence that Madison was the author of the disputed 
Federalist Papers, 

Althoun:h a trial'and error technique, the statistical 
approach has, therefore, at least as much scope as conventional 
literary rosearch. In problems of disputed authorship, a literary 
analysis of semantic content can be a -disadvantage, since each 
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researcher has his ov;n bias. In contr;.r;t .statistics excludes it 
throufjh an objective assessment of CAUrdv:! criteria, 

1,1,1,2 Ihe second branch of quantitative linguistics, the study of 
the optimal systccns of language structure, belongs to '-'hat De 
Saussure^ calls semiology, the study of the system of signs expressing 
ideas. Herdan^ applies this study to the i^ritten word. In his view, 
Morse Code approximates an optimal coding sj'stem in tlT^' tors o 
systematically represented by all possible combinations of dots and 
dashes up to length four, the most frequent letter being assigned the 
shortest code, and numbers by canbinations of length five. In 
natural language the study of the constraints in the number of 
possible sequences of letters (or phonemes) and in word length 
constitutes a part of semiology. 

Those coding principles are pertinent to the study of * 
meaning, their application to which may be seen as a consequence of 
Martinet's'' econamy theory. In it the evolution of language is ' . 
claimed to be governed by two forces, man's inertia and man's 
communication needs, from which tv70 kinds of economy follow. One 
called syntaj5:matic economy consists of the reduction of the length 
of a word (or lexical unit) which usually expresses a frequently used 
concept. An example of this economy is the replacement of 'machine a 
laver', a long form, by Bendix , a shorter fom. Paradigmatic economy, 
which takes place when a concept does not occur often in the language, 
consists of absorbing new concepts into a language vrithout additions 
to the vocnbulary, although at the expense of longer items in the 
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toxt. The co:nblnine: of chine , and laver into the lexical unit 
•machine a Irivcr' before the arrival of B endlx v;as an instance of this 
type of econc ny. These econoraies affect canputp.tional .linn;ui5Jtics. 
The presence of syntagmatic economy makes it necessary for words (or 
lexical units) to be classified by an elaborate structure of semantic 
components in order that a mechanical intelligence may understand a 
text. For example, the connection between words such as chair (a case 
of syntagmatic cconoffly, since it is a reduction of 'something one sitr 
on*) and sit has to be shox^ by components that represent the meaning 
of chair as 'saTiething one sits on'# 

1.1.1,3 The third branch of quantitative linguistics, the economics 
of mechanical translation, involves empirical examinations of the 
immediate environments of ambiguous forms for an approximate 
resolution of them. Van Buren^ in his definition of lexical items 
( 'multiverbal items* as he calls them) is groping along these lines. 
He defines multivcrbal items as combinations of words at least one of ' 
which totally predicts, in certain environments, the occur rence(s) of 
ther. other v7ord(s). For example, the word hot in 'hot dog* predicts 
the occurrence of dofy or more precisely the specific meaning of dog 
( dog; meaning sanf>;jge ), and consequently dofy is disambiguated by hot. 
Upon such predictability depends Booth ♦s function number technique, 
which vdll be described in Section 1.3.2.1.1. 

✓ 

The application of statistical semantics to the problem of 
multiiMcaninfr was ndvocated by Weaver^ in 19^7. Ho envisaged not just 
the scanning? of the words surrounding, an ambiguous one, but a complete 

18 



investif:ation to find out >rfiich part of the context was most us<9ful i^|t 
roducinp; a^nbiguity and at what point incroaslng scana brought 
diminiching returns. 

An actual investigation has been r,iade by Kaplan t**"^ With 
ideas similar to those of Vteaver"'-^ he compares the effectiveness of 
the iirnnv liate context in reducing ambiguity vrxth that of the v^ole 
sentence. He initially speculates that the effect of context would be 
most marked on homonj'm ambiguities where, for example, b l o w meaning 
•to blossom* is easily distinguished from blow meaning; 'to pant' and 
least marked where the different meanings of a word are most closely 
related to each other, as in the case where blow can mean 'to produce 
a noise by blowing*, 'to pant or puff* or 'to talk loudly or boast- 
fully'. To test his hypothesis, the folloi5d.ng procedure \:as adopted: 
Translators were given ambiguous vrords, each of which was assigned a 
list of possible meanings, and a series of utterances in which those 
words appeared. The translator was instructed to select the 
contextual meaning of an ambiguous word for each utterance. The 
results of the experijnent revealed the following information^ the 
word after the ambiguous one reduces multimeaning more effccti^'ely 
than the woi^d before it; two words on either side are almost as 
effective as a sentence; words with many meanings are as effectively 
reduced as those with only a few. In addition, lexical woixis were 
far more effective than function words. 

For the translations of a word between which the differences 

1? 

in meaning are subtle, Pimsleur-*''^ established »tran semantic frequency 
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counts Ihese are frequency counts of target languacje tian.^ilritions 
of a given source lanrcua^e word, \Ailch show the probabilit * of 
occurrence of each of them. By moans of these counts one may 
eliminato from the computer memory the translations least likely to 
occur and thereby save on machine operations?, albeit at the expense 
of the quality of the translation. Ihe remaining, high frequency 
translations, the 'cover words', would be used instead of the 
eliminated vjords to prox^V^o a •"r^crent although stylistically 
unrefined translation. Thus vAiile 'the roof is laden with snow' is 
the idionatic translation of the Geman sentence, »Das Dach ist 
schwer von Schnee', the machine would be programmed to provide the 
translation 'the roof is heavy with snox^», nonetheless, to avoid the 
extra machine operations needed to decide when heavy should oe used 
and xvhen lr.d9n> 

Rciflor'*-^ and Mersel^^ similarly adopt the criterion of 
frequency of occurrence in their classification of utterances as 
idioms. Theoretically whole sentences could be treated as such, but 
their number would be infinite. The stock of idioms set up for 
mechanical translation vjill, therefore, usually consist of short 
phrases sot up in consideration of the TL. In Reifler's example the 
English phrase 'the fundamental idea', corresponds to an acceptable 
literal GeriTian translation, 'die grundlegende Idee' and does not, 
therefore, have to be classed as an idiom to meet the minijnum 
requirements of translation. However, the phrase would be so 
classified \Acre the more idiomatic translation 'der Grundgedanke' is 
desired, namely^ in the type of texts in which the phrase occurs often. 
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l\io econoiuios provided by stntictics have a place in the 
sottinn; up as well as the use of the dictionary. Statistical data 
help one to determine how larpe a dictionary must bo in order to 
contain all the words most likely to bo needed for a given subject 
and how to arrange entries in order of frequency to reduce searching 
time, Parker-Rhodes'^^ statement , hovjever, that statistics is only 
marf^inally useful i. establishing - '^cedure:j for mcchanjca., 
translation, although of value in their application once they have 
been established expresses an appropriate general impression, 

1.1.2 Studies of the distribution of words to place them into 

syntactic slots have been made for a long time. For example, throuch 
a study of the distribution patterns of sets of words like cassage , 
casgCT ent nnd £fi station, the suffixes ^ment, ^age and ^tion 
although different in meaning may alike be categorised as nouns. 

LikevTise in lexicography the meaning of somewhat synonj^mous 

words such as rcMipre , briser and casser , may be distinguished by 

their distribution patterns. The degree of synonymity between the 

vjords would be indicated by the similarity of the patterns, although 

17 

the same criterion may apply to antonyms, too. Dubois ' carries out 
a case study ^v-ith the adjectives aigu and pointu, both meaning sharp . 

~ In it he pinpoints the semantic categories of the nouns, followed by_ 

those words and finds that among nouns admitting adjectives like 
of filo or arrondi , pointu occurs^ where ainru can, but not vice versa. 
Among nouns adniittin.?t the use of adjectives like sourd nnd per pant , 
airu appears v:hGre pointu can, but not vice versa. The word pointu , 

21 

o 

ERIC 



9 



therefore, is the genoric term in t Tirst ca^o, aifru in the second. 
While the distribution pattern ter -nlque does not show how the 
different semantic content of each v.ord is structured, it does reveal 
subtleties of usage that a speaker of a language is not consciously 
avxaro of, but vrhich a cauputerised ^;rnantics may ult biatcly have to 
tak into accoun • 

Methods borrowed from psychology have been used to 
determine a woixi's meaning through a listing of its paradigmatic 
contexts. One such method is factor analysis described by Barthes, 
in \>rhich a word is defined by its proxjjnity in meaning to one member 
of each pair of antonyms. Usually the proximity is measured on a 
seven point scale. Thus if visa were the word to bo defined and 
authorisation and ban constituted one of several antonym pairs, the 
word's total synonymy iriLth authorisation would be represented by a 
rating of one and synonjiny with ban by a rating of seven. Since visa 
is not in fact totally synonymous with but merely more closely 
related to authorisation in meaning than to ban, a rating of two 
would probably be assigned for this pair of antonyms. Another pair 
might be l)ot and cold. Since in this case visa is equally unrelated 
to both antonyms, a rating of three and a half would be assigned for 
this pair. This rating is misleading since its point of reference is 
"^ambiguous. It is unclear >;hethor a rating of three and a half means ~ 
that vjpa is very much related or very much unrelated to both vjords. 
Weinreich^^ appropriately points' out that factor analysis could be 
useful for elicitinr^ the affect of some words if a statistical 
analysis wore made of several people ^'s responses, but that the method 
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10 

would only incidentally reveal the core pioAuin;^ of a vord. 

Another approach in %:hich par.ndir^ ntic contexts ?.re 

on 

consid d is that oT ^reo association,'^ In it, several people are 
presented with a voi*d and are asked to state another uord which it 
reminds than of. The responses are then sorted out to olijninate 
private meanings and frequency counts ax^e j(.cA\u(j of each remaining 
response. Of the remaining words the one elicited most frequently is 
considered to be the one most related in meaning to the original word. 
Unlike the factor analysis approach, this one reveals hyponymy 
relationships^ as lAen the specific term f r5 gid , for example, is 
observed to sometimes elicit the generic tern cold, but almost never 
vice versa, 

1,2 Information ^ Retrieval 

Statistical methods are applied to some aspects of 
•'information retrieval", a term used to describe a vride area of 
activity, Sharp*^-*- defines it as follows j "It is generally taken to 
embrace the vrhole field of the probl<an of recoveidng from recorded 
knowledge those particular pieces of information which may be needed 
at particular times for particular purposes,,,," Information 

retrieval may bp; subdivided into two principal areas, fact retrieval 

and document retrieval, Tnis dichotauy is not always clearly 
recognised in the use of terminolofry in the literature. Document 
retrieval Is concerned with the problems of selecting a document on a 
given subject area from an already classified series of documents, 
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and fact retrieval with tho problr n of sunim^iri if^ tho content of 
documents to make them amenable to clsssificatic . Iho two areas 
coraplement each other, but present different problems. 

In a fully Gutcmated docuTient retrieval system, the actual 
encoding of a user's request expressed in a natural language would be 
performed mechanically in the retrieval of a document, as well as the 
matching of this code vrith the codes of existing documents. However, 
while a lot of research has been directed towards the organisation of 
knowledge, there has been no attempt to show how it relates to the 
organisation of language. Document retriwal will be treated in 
chapter 3 in an examination of classification schemes. 

The scope of fact retrieval vaiT.es according to the 
researcher. A summary of a text (an abstract) by machine may be 
expressed in natural language or in a notation. In the encoding of a 
user's request two considerations are involved. One pointed out by 
luhn^^ and Salton^^ is that a user may be more interested in v^iat is' 
original in a text than in vrfiat general subject it comes under. Ihe 
other has to do with tho type of user. For example, a marine 
biologist may need a different summary of a text on fish fran a 
fisheitiian. Tnere are two varieties of fact retrieval, derivative 
indexing and assignnent indexing. The former is b^sed on the 
principle behind the human indcxer's technique of underlining 
Ijnportant words in that a summary is derived fran the words of the 
text itself. In assignment indexing, on the other hand, a summary is 
not directly fomed frcoi such words, but from a notation that 
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interprots them. Of the two fonns of fact rotrioval deirivative 
indexinn; is the simpler one on6^ according to Coyaud and Slot- 
Decauvillo has boon in existence the longest, 

1.2.1 Edmundson and Viyilys^^ suggest that in derivative indexing 

words, to ^^nich a f;ignifioanco rating is assigned, may be selected 
according to positional, senantic, or pragmatic criteria. A 
positional criterion would be said to apply if the first sentence of 
bach paragraph, for example, fomed part of a sximinary or if words in 
text were rated significant on the basis of their occurrence in 
titles, for example, vThere a writer might be held to choose his words 
with groat care. A semantic criterion would be said to be employed if 
a semantic categorisation of words of the type summary and conclusions 
were utilised; here the significance rating of a vyord would depend on 
how comprehensivo it was. A pragmatic approach is said to be adopted 
when criterj.ri are invoked vrfiich do not directly arise from the text, 
such as the occurrence of the nsmes of specialists in a field. 
Vhichever criteria are invoked, derivative indexing tends to cane 
irf^thin the province of quantitative statistics. 

1.2.1.1 In his key-woM-in-contoxt method (KVvTC) L\ihn^^ attempts to 
base indexing on positional and statistical criteria on the hypothesis 
that the frequency of a v7ord, since winters tend to repeat words as 
they advance their arguincnt, and its position ±ri a sentence are 
important for detcra?.ning its significance. V/ords so graded as to 
their significance would constitute a pattern representative of the 
content of a text and texts having to* do \n.th similar toj^ics would 
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possess similar patterns. Two itniricdiate dr*tvrh:rcks may be pointod out* 
First, many ;;ords such an nnd and IJic^, ^rhich occur frcqueritly, are not 
usoful for iricJcxing and vjill have to he so dociirnated by inclusion in 
an antidictionary. Secondly, writers tend to uno synoni'Pis for 
stylistic variation, which reduce the chances of important words 
appearing prominently on a frequenc3»' list. 

A refinement of Dohn's statistical approach is suggested by 
Edmundson and Viyllys,^^ who recognise that a terni that is sufficiently 
rare in pjoneral usage mii^ht not occur often enough to rank high in 
the frequency count of words, even in a text where it is important. 
Since according, to information theory it would have a high content of 
infomation, it would therefore be ijiiportant -in indicating the subject 
matter of a text. To identify this typo of word, the ratio between a 
word's frequency in a specific text and its frequency in general 
might be examined. 

For the above mentioned tj^pe of tera, •^special sots of 

28 

reference frcquojicies for special fields of interest** would be 
kept. For each field a vocabulary of words is coTipiled and the 
frequency of occurrence of each word in a statistical sample of texts 
that belong to the field is calculated. In addition, the total 
number of words in the sample is counted so that the percentage of 
words that each word of the vocabulary represents may be calculated. 
By calculating the percentages for words in a particular passage ore 
may determine itn field. Certain percontapes for V7ords such as 
tre^min , nenfrory and bohavi oural , for example, would indicate that 
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a given passapo belonged to tho fiold of psychology. If tho 
porccnuagQS for a minority," of other xrords in the passage failed to fit 
the pattern for psychology, this fact ^;ould be taken as an indicatirn 
that another field was involved. Tho frequent occurrence of the word 
chronio^cmo t for example , in a passage belonging to the ficjid of 
psychology might suggest that it was about the hereditary factor in 
httnan psychological makeup. The above refinement in statistical 
procedure is the making of the notion^il and idioglossary 

ftppro?.chcs, which ttIII be discussed in section 1,2,2.1 and 1.3t2.1.3« 

30 

1.2.1.2 In Harris*^ string analysis positional criteria are used for 
indexing. Each sentence is analysed into a formal centre and the 
right and loft adjuncts. Words in the fomal centre are considered 
significant and those in the adjuncts redundant. In the sentence 
•Today automatic trucks frau the factory which we just \isited carry 
coal up the sharp incline the formal centre, 'trucks carry coa1% 
would foiTn tho extract. Other examples, however, support Coyaud^s 
contention that the formal centre does not always contain tho most 
important inforaation. In Noel's-^^ sentence, 'Additional information 
concerns availability of microfilm services' the main topic is found 
in the phrase, 'microfilm services', >jhich is an adjunct. 

32 

The 'Sentence Dictionary' of Earl and Robison-^ liko string 
analysis is based upon tho hypothesis that topic sentences may be 
idontificd by their stricture. To classify indexable sentences, a 
large r^amplo frcn a total of nine chapters out of books selected at 
randan froji tho Palo Alto library was sorted accoi*ding to structural 
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typo. In the firt;t nnalysis a sentence ;:,)£; counted ris a sequence of 
parts of speech, but the 30^3 tvpcs of GC'?itence discovered were found 
to be too high for ecaputation. Subsequently sentences were counted 
as sequences of phrases to reduce the nmbor of types. The topic 
sentences found in tho books tcere in addition placed in an index. 
Since at the time of vrriting the dictionary v7as still in the experi- 
mental stage, it is not known \;hether or not the topic sentences in 
their structure fom a distinct grouping. However, even if the 
••Sentence Dictionary" is only partially successful it will still be 
of value to derivative indexing. 

.2.2 In assignment indexing the notation (or documentary 

language, as Coyaud and Siot-Decauville^^ call it) expresses the 

relationship between synonymous utterances for application in what 

Salton-^ calls language noivialisation programmes. Of those that 

« 

operate on sentences there are two well-known types. In one, the aim 
is to reduce complex syntactic constructions to a group of equivalent 
simple kernel sentences \7ith a specified canonical pattern such as 
the noun and verb one. Rigorous rules, however, have not been 
formulated to carry out the aim. Tho other type is the transforma- 
tional appi^oach, in which surface structures such as 'the man eats 
the food* and 'the food is eaten by the man*, are recognised as 
equivalent through an analysis of the active and passive voice. 
Below the sentence level the thesaurus approach, in which words in a 
text are replaced by corresponding thesauxnis heads, is a form of 
language normalisation in vrfiieh synonyms are eliminated and redundant 
woixJs are ignored, although inf oration is lost in the type of 
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thesaurus in v;hich a s)-':jcific torn I3 rcplsced by a generic one. 

L.2.2.1 lAilin-^-^ advocates a thesauras approach based on stcitistical 
criteria to colligate the various Ic/gIg of Euocificitjr on vliich 
authors express similar ideas. Such a thesaurus v;ould consist of 
•notional f^railies' (groups of woi^:,- related in ncaning)" caapiled by 
experts in the field fran which the texts to be indexed are dravm. 
E^ch word in a text would bo assigned a keyvroi^ or a concept number 
according to the family to which it belongs. The existence of a 
notion would depend on its likely frequency of use. Since in the 
field of electricity, for example, the words subsumed under the 
notional category electricit.v would predaninate about equally in most 
of the texts, the words would be partitioned into more specific 
categories. At the other extrc;me the notion butterfly in texts on 
electricity would probably appear too rarely to discriminate between 
texts, so that a more generic notion like insects , or even 'living 
things' would bo more approprtate^ 

Kie thesaurus having been ccmpllcd, the words in a passage 
are analysed and frequency counts are made of the notions. Tno most 
frequent ones, v.4iich are considered to be the most representative of 
its content, fom a 'mechanically prepared notional abstract', an 
encoded summary, A refinement of the notional family approach might 
include a supplc:nentary index by which to regroup words under 
different notions accordinn; to context. If the word butterfly 
occurred frequently, for example, in a passage on electronic 
butterflies and on the basis of its usiial frequency the notional 



catep-oiT ln5;ectr> hnd boon set up, the ccnputer inir.ht be profTrammed to 
replace it vdth the caterrory bitterfly ^ The set of notion^il fa-nilies 
would thus constitute a general classification with criteria for 
pigeon-holing texts by machine, 

L,2,2.2 In the interlingual approach to language of the Cambridge 
Laboratory Research Unit thesaurus heads having a \d.der scope than 
Luhn's notional categories were set up iTxth a view to making 
mechanical translation, library (document) retrieval and mechanical 
abstracting amenable to the same treafenent, Masteman, Needham and 
Sparck-Jones^ claim that "the very nature of the problem of inter- 
lingual mechanical translation is like that of inf onnation retrieval 
in that it demands a general, that is, a logical approach", Ihe 
"logical approach" consists of linking the various surface structures 
of languages to a canmon deep structure, ichich constitutes an 
interlingua. 

One function of the thesaurus is to resolve raultimeaning. 

Accordingly > concept numbers, v;hich represent heads in Roget's 

Thesfomis, are set up and words are listed under than, ambiguous 

words being placed under several concept nmibers. For the word plant 
37 

in Mastci'man^s-' exainple there are three concept numbers 18^, 30O and 
367, dependintr on vAiether it means to place , to insert or £ vepretable 
respectively in a given context. If the context includes another 
ambiguous word flownrp nrr , for example, which may be found under the 
numbers 5> 1^1 and 3^7 depending on v;hcther it means essence, produc e 
or verrotnblo respectively, the concept, numbers for both words would be 
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matchod against one rinoihor. Since the number 36? is possessed by 
both words, it is accepted as reprosentinf: their contextual meanings. 
In providing explicit indices by i^hich to nnlce disambi(juation 
ccnputational the thesaurus approiich of C.L.R.U, is the starting 
point for a ccTiprehencive coniputerised dictionary, the construction 
of which will be discussed in chapter 4, 

The thesaurus heads represent not only words but also seems 
and endings (••chunks** in their teminology). A chunk is "the smallest 
significant language-unit vdiich can exist in more than one context, 
and vhich for practical purposes, it pays to insert as an entry by 
itself in an MT dictionary". The Italian word jgiantatore, for 
example, consists of three as follows: plant , at and ore. In a 
monolingual thesaurus a chunk may be represented in the form of a 
tree and vrfien it is connected with a tree of another language for 
translation, "the two trees together for:n a lattice each point of 
which looks both vrays and is itself a translation point". 

Ihe greatest challenge to the interlingual approach is the 
representation of syntax, Parker- Rhodes^^ has found that because 
part of the meaning of a sentence is conveyed by the choice of words 
(lexically) and part by the manner of their combination (syntacti- 
cally), to a different extent in different languages it is difficult 
to make translation ccmputational. To relieve the difficulties of 
word order he advocates the use of affixes to replace syntactic 
structure for convey^.nrr infomation. For example, iri the phrase 
*race horso* the role of the word order, which contrasts it with 
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•horse race* \;ould bo t&kcn over hy a role-indic^itinn; affix added to 
one of the vords. Viiethcr the representation is 'race-R horse' or 
•horse race-R* would dcpcmd upon the word order of the target 
language, 

Richens^^ represents syntn^atic relations by means of a 
"•seniantic not" that links concepts ("naked ideas") that are not 
specific to any individual language. The sentence, 'the dog bites 
the cat*, for example, is represented by the follo'td.ng two- 
diiriensional arrow structures 

dop-i-^part of^-=i — teeth— =^->contact< — cat 

much 

The concepts are designated by words sinply as a convenience to the 
observer. Ihe diagram is essentially an explicit paraphrase of the 
original sentence inasinuch as it could also be said to represent the 
sentence, •the dog's teeth have much contact xrf.th the cat'. The 
dependency links indicated by the arrows appeared later in Schank's^P 
research. At the time of writing Rich ens had found no general 
mechanical procedure for extracting semantic nets from a text, 

Parker- Rhodes '^^ "interlingual formulae" resemble Richens' 
"naked ideas" except for the format of binary bracketing, -which 
determines the sixrface structure that the fomulae will take in a 
given language. For the clause 'dexterity which cheats the eye' for 
example, the formulae are as follows? ((eye cheat) type) (hand 
skill)). By rearrancring the brackets according to given rules of 
interlingual grnmniar one may represent various paraphrases of the 
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clause. Accordinrc to ens rule, x:hich coiil.vos on the wo2\3 type (a 
•*weak olcracnt'Of the foMulne ((eyo chcrit) type) mny be transfoiTied 
into the synonymous construction (eyo (clicrat. t^Tpc)), T;hich represents 
the phrase •oye-choatinfr dexterity According to cnother rule, a 
redistribution of thesaurus heids allovs for the paraphrase 'visually 
deceptive*. In order to implenient thei?e rules a computerised lexicon 
(or thesaurus) is necessary to show hov; the for*nulae are created hy a 
step by collation of the dictionary entries for each word. 

1. 3 Mechanical Translation irLthout Interlingma 

Wiile the C.L. R.U. regarded translation as a two-way process 
involving the representation of a source language by an interlinsnia 
>7hich provides output in a target language, specialists in mechanical 
translation alone have looked at it as a one-step process^ in vrtiich 
the source language is translated directly into a target language. 
Various nonlinguistic models have been sruggested for this one-step 
process. 

ho 

1.3.1 One is Weaver's^ cryptography analogy, which is based upon 

the observation that by making frequency counts of letters and 
combinations of letters for a given language, one can decode a message 
written in it. In a letter to Korbert l&cner (19^7, March ^th.), he 
sufrgosts treating ci*37)torjraphy and mechanical translation analogously 
so that a text in Russian, for o;Kmple, would be visualised as an 
English one coded in strange symbols. The value of the analorty is 
limited, since only at the sentence level is a one-to-one correspondence 
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likely to occur botveen tho uttcrancec of two difforent lann:iian:es, and 
the nitnber of po.'^sible sentcncc-s in a Inncuap;© is infinite. 

Another modol, adopted by Mida,^^ Jakobson'^'*' and Yngve^^ 
Independently, is based on coTjniinication theory. A few quotations 
will throw light on the value of the raodsl. In making the point that 
translation is not \<ord-for-t<^ord but thought-f or- thought Jakobson^^ 
says translation from one language into another substitutes 
messages in one language not for separate code-units but for entiro 
messages in some other language". In a more elaborate use of 
corranunication theory terms, Yngve^'' claims that '•Ihe function of the 
message source is to select the message from among the ensemble of 
possible messages. The function of the rules of the code or codebook 
is to supply the constraints of the code to vjhich the encoded message 
must confora. . . . .The function of the decoder is to recognise the 
features of tho encoded message that represent the constraints of tho 
code, remove them, and supply the destination vdth a message that is - 
a recognisable representation of the original message". In this 
passage the use of the phrase "jrules of the code or codebook", for 
example, instead of 'graTjnatical rules* serves to emphasise that 
natural and artificial languages may be analysed by the same linguistic 
methods. Ihis recognition of parallels between linguistics and 
cofnmunication theory is of cross-disciplinary interest. However, it 
does not provide any insights into mechanical translation. 

Ceccato's model concerns the mentalistic processes behind 
linguistic performance. According to him, translation is possible 
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only if, by semantic connections, the SL text ^cin be fcplaced by tho 
thought thst it reprcGcnts, which, in turn, c^^ bo replaced by the TL 
text, Tho model shous the step by step con^^t^^ng of dictionary 
entries by vhich a huinn underst.inds an uttor^^^<^» Upon finding that 
the first voi*d in a sentence is the or tree, ^^^g translator sots up 
in his mind a list of the types of Xv^ords that ^^y follow, a 
"correlational structure"* Upon finding the ^^cond woi^ he links it 
with the first to fona a ••correlational net«. As additional words in 
the sentence are fitted into the net, the sti^^ture of it is adjusted 
accordingly. For the sentence 'John she lov^^' of dubious grammati- 
cality the translator tests the binary connec?'^^ons, 'John loves* 
loves John* and 'she loves* and by a proc?^^s of elijnination 
selects the second two as the valid ones. ArJ ^^araination of the 
pairs reveals John to be the object, loves tb^ Verb and sh^ the 
subject of the sentence and a correl^itional is fom^d accordingly, 
which Ceccato illustrates vjith squares contaiJ^^g dots» The dots in 
tho lov:er squares represent the woi^s, love^ Jchn , the dot in the 
uppwr square, she. The replacement of dots ^J^^ squares hy lines 
would make it apparent that correlations ar© fact ijniaediate 
constituent analyses. In Ceccato 's model nei/ teminology is applied 
to old techniques o 

1,3,2 'Hie first empirical linf^istic att^Pt to grapPle >7ith the 

problems of MT was the Georpetwon-I&I oxperin^^^t carried out by 
Dostert^^ and others in translating English iJ^^o Russian, VlT was 
divided into two operations, one of solectiori^ in which lexical data 
is handled, to produce the correct TL word, an^ m^j^ipulation. 



under vibich woixl order was subi^imGd. Tiio pcnetrition of tho pi^oblems 
was not deep, since the expvrrjTnent Mns on a 5:r.all scale, Althoufih 
dependence upon post- and pre-editinr; vas elarainnto.d, the dictionary 
consisted of less than 250 tcnns and no nore then tKO English 
equivalents were assigned to each ambiruous word. 

The progrs-TCTe dealing id.th tho dictionary ccmponent t-^as 
divided into five operations. The first covered SL and TL vorcis that 
were in one-to-one correspondence v/ith each other. The secon^i treated 
multiTieaning problems that could be solved by examining the word 
before the ambif^uous one. The third handled those that an ex^'nination 
of tho v;ord after the ambiguous one could solve. In the fqjurth 
operation words in SL that were superfluous in TL were onitted. In 
the fifth, terms missing in SL that were required in TL wore added. 

In this early experiment t then, the criteria for translating 

were based on the physical rather than on the structural context of 

utterances. This distinction nay be observed in an analysis of the 

sentences, •! painted the vrhite wall* and •I painted the wall vhitc', 

A translation by structural context would take into account the 

50 

difference in IC stinicture between these sentences and Schank^ and 
V/einreich^^ (chapter 2, 2.2,2 and 2,2,3.2) would, in addition, relate 
it to the orr;anisation of non-linguistic knowledge. But a translation 
by physical context would merely take into account the difference in 
the v:ord order of vmII and white, . 

,3.2.1 Mechanical transition r .he 1950s, followinn; the 
GeorfTOto-.m cxperinent, continued to bo bnsod on physical context, 
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Further research produced a niscellany of anbifoiity typos, - predicate 
block structure and inflecticvn.-^! a^bifcuitics, hano^rsphs, orthorrraphic 
coincidences, and contextual and punctuation problens, 

1.3.2.1.1 The first two tjpcs of ambi{Tuity52 concern words that have 
one meaning but rasny possible translations. A^ibiguities in a 
predicate block structure are said to occur when a word has one 
meaning but many possible translations into the target language, due 
to the syntactic relations into which it may enter. The Russian word 
sdelano , for example, is such an ambiguity as it may be translatied 
into English as done, is done or as be done depending on whether it 
occurs with an auxiliary verb, viith no subject or auxiliary or vdth 
pust * respectively. Since is and be are function words f they may 
alternatively be relegated to sjTitactic analysis so that done remains 
as the translation of sdelano . Ambiguities in predicate block 
structure are only classifiable as ambiguities because sjmtactic 
analysis was based in the 1950* s on the physical and not the 
structural configurations of words. 

Inflectional ambiguity has to do with morphology and is said 
to occur when the number, gender or case of a word is not clear. In 
the Russian example of Janiotis and Josselson^^ the word stanchii is 
such an ambiguity because it may be genitive, dative, locative 
singular, norninative or accusative plural. Inflectional ambiguity 
covers Kcifler^s^^ distinction between monogenetic and polygenotic 
moaning. In his Gennan example, the word aus (out of) is said to have 
monogenetic meaning because it can take one case only, the dative. 
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"Dio word dirr:<?r ( Lb.i has polyr.c-notic rici'.ning stnco thlo form nay 
have three mc-mings dopondin^ on vhethcr it i::: sir:£julc:r and iriasculine 
nominative or a fminine gcriitivo or clc^iivo or is a genitive plural. 
Frcn this cxrmplo it may bo observed that '^inflectioncl cmbiguity* is 
sjmony^nous with "polygcnetic Tiicaning#" 

Hcraographs and orthographic coincidence ^ra words that 
have nany unrelated meanings, Ihe latter typo of ambiguity in 
addition covers such words that belong to th? same gransraatical class* 
ThQ Russian verb plachu (I veep or I pay )« \Aich is inflected Trm 
both pla^cnty ( vreop ) and plati b * ( piy )f is en orthographic coincidence. 
On the other hand dan , vrfiich nay be either the first person singular 
of ghdat or the genitive plural of d^na is sinply a haaographt Ihe 
rabiguity types nentioned so far do not appear to have been formulated 
according to the types of computation involved. In such a foraulation 
predicate block structure, inflectional ambiguity and hanographs would 
be categorised as tj^cs of ryabiguity that can bo resolved by parsings 
Orthographic coincidences would be grouped with contextual problems 
as types that cannot bo so resolved* 

Contextual problems are types of csibiguity that make a 
ccoiputortsod seraantics necessary. An example of such a problem is the 
word bop.rd t which may mean piece of wood, food (as in •room and 
board* )f 5iil2£» cqimcil (as in •board of directors •) or an action as 
in •to board a train •, \uiile tlvis last meaning can be identified 
because the V7ord is a verb| board in all its other meanings is a noun. 
In sane cases a irfiolo sentence will not solve this type of problem. 
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In Lukjanow^s^^ Gxri.-nnlo, the Russian sentence »Ia Zaplatila za sto^* 
(•I paid for the tp.blc/board^ ) ctol can be translated as table or 
board. The limitation of dis<-:nbifraati on to .vrithin one sentence, 
have been convf^nient for oarly attempts at mechanical translation ^hd 
within the tolerable linits of inaccuracy according U Kaplan's st^^' 
but it does not have linguistic basis. 

Ihe adoption of one pragmatic criterion in disambiguation* 

that of limiting the context under examination to vithin one word, 

allows direct syntactic links to be utilised. Such links are 

3*epresented by concept nu^ibers in Booth, Brandwood end Cleave 's^ 

method, tills approach is similar to the one used by Masteman, 

(section 7,. 2. 2. 2), except that in hers ambiguous wo3?ds are scatter^^ 

among several concept nmibers and arc looked up by means of an 

alpliabetised cross- reference dictionary, while in Booth's method tP^ 

* 

different numbers representing a word's meaning are listed togethe^^ 
This technique was applied mainly to prepositions, ^^hich occur 
frequently. One such preposition is the Geman word, auf < lAich Tn^> 
appear in the phrases, 'auf dcm Tisch', 'auf dem Tanz' and 'auf defi^ 
Lande* meaning respectively 'on the table*, 'at the dance* and *in 
the country*. In Booth's notation the various possible translation^ 
of '"auf" are represented as followst ('•auf"^!) on, 2)at, 3)in) an^ 
correspondinnrly the German nouns as follows: ("Tisch"=l)), ('•Tanz**^2))f 
(«Lande"=3)). The matching of numbers provides the translation, 
the case of 'auf dcm Tisch*, thfe concept number for Tisch is found 
to bo identical to the nu^nber for auf meanincr on , so that the 
translation, 'on the table* is given.' By the application of such 
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numbers on a larger scalc^ the correct preposition may bo supplied for 
every noiin in En^^lish. 

The representation of syntagmatic links by means of concept 
numbers is a means of detecting idioms. So applied, such numbers are 
called by Booth, Brandwood and Cleave^^ function numbers. These 
uniquely represent the words that can bo part of a particular idiom. 
Vhcn a word vdth such a number is detected, the words following it in 
the text are tested for possession of the same number. For example, 
the words, il, French idicm 'il-y-a' might each bo 

assignee' function number, 1. Upon finding il in a text the computer 
searches the words followir.g il. If ^ and a follow, the idiom 
translation, 'there is' is supplied. Otherrri.se a literal translation 
is assigned by default. 

Pragmatic solutions to translation probleis include the 
manipulation of punctuation. In German, capitalisation is usually an 
explicit criterion for dissnbiguation, since it is applied not only lo 
words at the beginning of a sentence but also specifically to nouns. 
The comparative dichter (tighter) thus differs in form frora the noun 
Dichtcr ( -noet ). This distinction does not apply, however, at the 
beginninii of a sentence. In 'Dichter ist der Hahn (faucet/cock) 
geworden' ('The faucet/cock has becane tighter/ a poof) Dichter 
contributes to the ambiguity of the sentence. To receive the full 
benefit of the German convention Reiflcr^^ ad^'ocates the Preservation 
of capitalisation for nouns exclusively so that the first word of 
this sentence vjould be dichtnr from \^±ch the machine would derive 
the tran^rlation tirhter . 
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Various punctuntion proble-.ns mirht disappear throufth 
manipulation. In Fronch/''' the apostrophe r.nd the da.sh are ambiguous 
in that they sonotimes separate two different vords and sOTetimos two 
parts of the same word. An apostrophe divides the sinftle x^ord 
au.jourd^hui Piud the two words I'or and siinilarly a dash, the single 
word porto^cle and the tv:o words vi ont->il > Tho rejection of the 
convention that so separates tvro distinct words would resolve the 
problem. However, while such a manipulation was a convenient stopgap 
measure in the 1950»s, it is no substitute now for effective procedure. 
Parts of a word might be distinguished fran canplete words by Booth's 
fiinction numbers. 

1,3.2.1.2 The stems and endings me^.hod, which Booth^^ and Richens 
first applied in 19^+7, was a way of segmenting phrasal compounds for 
econmy in the inventory of items in the lexicon. A group of many 
vocabulary itenis like seabor.?^ , seaside , seavrny , board, way , boards 
and vrays vrould be atomised into fewer fortns, sea-, -s, j^board, -side , 

rISZ ^'^^'h increasing economy as more and more words are 
partitioned. This approach may be applied billngually too. xn the 
German words, Mnsik and Pi rektor »'k' vxould be segmented fron the rest 
of the word to imple.nent the rule that »k* becones 'c^ in a trans- 
lation into English. Hybrid compounds like Goldhandel ( gold trade )> 
however, x:ould not be amenable to the same treatment. 

Against the economy ir\ the inventory of elements in the 
lexicon, cxpecially in an inflected language like Russian, the 
drawback of tho Additional canploxity. to the grammar of a language 
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necessary for penerAtinr words frofj ptcnis crA ondinrs must be balanced. 
Reifler^^^ indicated th.it It was on this account that he did not adopt 
the method. At the Umr, of his criticism, hovrover, he had access to 
photoscopic difjc, an ii-.provcment in tochnolorry that enabled a computer 
to absorb a relatively large vocabulary^ 

A port o^ oho complexity of grammar following froin the stems 
and ..dings technique lies in the careful setting up of them so that 
one V7ord may bo partitioned by computer in as few ways as possible. 
Tne opportunity to control the setting up of stems and endings exists, 
when letters or letter sequences can be part of either and thereby 
constituta what Reifler^^ calls an "X-factor". The Russian word, 
r'iuo povu ( fisher.nan ). contains one. The usual dissection of this 
word is r*iu-o--Povu , v:hero 'o* constitutes a connector and povu means 
•to the catcher^. Since the existence of the free forms, r*iu 
(•of fishes') and opovu ('to the tinO> makes the incorrect translation, 
•to tho tin of fishes*, possible r*iuopovu , is divided for the purposes 
of translation into r*iuo and povu instead of into r'iu and opovu . 
The connector, 'o* is the X-factor since it is the crucial element in 
avoiding incorrect construing. 

For sane v;ords the number of possible partitions into stems 
and endings cannot bo reduced because of inherent ambiguity. For 
eXcimple^^^ tho Gorman word V.^iohtraum , may be split into either V/acht 
and Hn\hn\ (pjuardrom) .or yhch and - Traum (Hraking dream* )> the •t^ being 
attachable to both stem and ending. Since the ambiguity lies in 
Vfo ohtrAum itself, prirtitioning must ha based upon a resolution of the 



word's meaning in context, 

.•3.2,1.3 Ono of the first prapimatic attempts at resolving; ambiguous 
words in a text consisted of a catcr?:orication of the various meanings 
of each word occordlnr; to subject area. A special dictionary 
containing words so categorised was called an idiocjlossary (or micro-, 
glossary), Vthich idioglossary to apply to a text was detorniined 
indexing it cither by machine or by a pro-editor. When it was first 
introduced, it was a stopgap measure to prevent a computer's very 
limited memory space from being wasted upon words not applicable to 
the types of texts to which mechanical translation was applied, 
Dostert^^ probably had the concept of the idioglossary in mind in 
1955> vrfien he sugp;ested that a "functional lexicon" be used '•,,.when 
a toxi^ in a given functional field area is being translated*'. The 
word strorm, for example, would be entered into two such areas, one 
consisting of geographical terns and the other of engineering terms, 
disajiibiguation then being dependent on the content of the text \mder 
consideration. 

The content of the text as a whole is determined by the 
tjTpe of frequency counts of woI^as made by Luhn, p, 12, In addition 
subdivisions may be recognised so that where a text fits into two 
subject areas two frequency counts may be made, ono for the local 
context of an ambiguous vjord and one for the vAiole text. Such a 
procedure might bo useful if a text had to do with the social implica- 
tions of atomic energy, for example. 

The task of structuring a system of idioglossaries was 
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undortnkon by Mickloi-en.^^ Like Lxjhn'n notional categories, his 
syston was arrived nt intuitively with .•icljuatraent.'? mndo th:*ou!?;h the 
observation of statistical cK-tru The riotrition for reprcsentinn: the 
idiofrlo£;c.ariGS was decimal, blr<its in the tens* coliiinn were reserved 
for the major divisions of knowledge s^nd those in the units* coliDnn, 
for their subdivisions. A word considered to belong to mathematics 
in general, for example, would be assigned a number such as 10, k 
term belonging to a particular branch of mathonatics would be 
specified by the replacement of the digit zero, Cius the number 11 
would designate an algebraic tern and 12, a geometrical one. This 
type of notation was not originated by Micklesen, but vras in fact a 
variation of Dswey's decimal classification, VAiereas Dewey applied 
it to retrieve documents, Micklesen designed his system to categorise 
words. 

Since Micklesen did not have access to a computer the words 
he had categori .3d were checked in the manner of a machine against 
actual texts to test the validity of his idioglossary system. He 
found that 83;^ of the words were correctly assigned. These results 
serve to e.nphasise the complexity of the organisation of knowledge 
v;ith which a computerised semantics must come to terms, 

lt3.2.2 Vhile the cnphasis in mechanical translation in the 1950s 
was on the use of the lexicon, its limitations v7ero recognised, Wiile 
Perry*s^9 experiments i-evoaled ttiat, a translation without a grammar, 
when applied to scientific and technical material, was comprehensible, 
m(xnbers of the MIT school, including Bar-Hillel and Yngve*^^ advocated 
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the systematic use of parsing to obviato l\io necessity for a 
prolif oration of cd hoc ruloc?, j.ilco tho one statinc that if Gomian der 
follovrs a capitalised word ultb no intcrvcn5vns ccrrima, 'of the* uill be 
tho tranr;lation 9!}yj of the tiivio» The Gdvantr.ge of a parsing prosrcirarae 
was that i-ulos set up for disixr.biguating a given word are equally 
applicable; to other \;ordG that belong to tha same paradigm. Tho 
method of i>irsing valid for tho article der» vhich may be nccninativo, 
genitive or dative, is equally valid for tho words, dicf?or ( thig ) and 
.joder (each) vhercas ad hoc rules apply only to individual vrords. 

Yngvo's tenet that syntax should bo handled before 
soloctional restrictions anticipates the main drawback of Katz and 
Fodor's''^ marker theorj'', a major devolopcient in the 1960*s in making 
se^iantica cc?^putational. Yngve eaysr "The soloctional relations 
bot:Tecn words in open classes, i.e. nouns, verbs, adjectives and 
adverbs. • .can be utili^ocd by assigning tho words to various meaning 
categoriGs in such a \my that when two or mora of these words occur 
in syntactic relationships in tho toxt, the correct meanings can be 
selected". Before tho moaning of the vrord plant s for example, can 
be dotomincd by that of flovTerinr^ ft it must first be ascertained that 
pTant is a noun and flcirorinR an adjective. In order to represent 
tho semantic content of a woi^ in a form useful for computation, the 
woixl*s syntrifjnatic stn^cture must be indicated. 

An attempt to formally ropx^osont syntactic relations v;as 
made by Bar-llillol^-^ in his categorical grammar. The goal tvas to 
ensure first th.at all grairanatical con-structions wore assigned the same 
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notation to contrnst with that of un grammatical ones and secondly that 
an utterance belonpinn; to a pivon part of symech and a word belonrdng 
to the same one be identically cncodedt Ihc- principles of Bar- 
Hillel's notational syrjtcai wore based upon the rules of arithmetic 
applied in the multiplication of vulgar fractions. The /s' and ^n* 
conbinationi;, of which the formulae represcntinn; the parts of speech 
consisted, were set up in the fonr. of denominators and numerators. 
The symbol 'n* designated nouns, 'n/n*, adjectives and 's/n*, verbs. 
The fomula for a whole utterance is derived from a step by step 
construing of the fomulae of its parts. In the sentence 'Poor (n/n) 
John (n) works (s/n)* the reduction of 'n/n . n* by 'cancelling out* 
to n represents the linking of Poor , an adjective, with John , a noun, 
to form a noun phrase 'Poor John*. The connection of this phrase in 
turn with v? orks is represented by the reduction of the *n'' (for the 
noun phrase) and 's/n* ccvabination to 's*. This symbol designates 
the sentence as grammatical, An utterance of the type 'Poor (n/n) 
works (s/n)', for example, would be reduced to 's/n% vThich indicates 
an ungrammatical sentence. Similarly 'works (s/n) John (n) poor 
(n/n)*, v?hich is analysed to be 's • n/n*, is so indicated. 

The main dravjback of the categorical grammar is that of 
scale. By testing sentences with a transformational grammar, it may 
be verified that the complexity of language is beyond what the 
conventional categories such as noun, verb and adjective represent. 
Personal experience reveals that the *s* and *n* notation with its 
arithmetical framexArork is overpowered by the demands of various types 
of sentence construction. Hov/ever, when stripped of the procrustean 
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framework, the prarnmar suggo£>ts r>n atanitjjition of tho traditional 
parts of speech into elements more useful for computation. 

The attcTipts in the 19'4'0's and 1950's to circumvent tho task 
of organising the semantic content of a word into a canputerised 
dictionc-^ry only succeed at all in quantitative linguistics. For 
mechanical translation and infomation retrieval, the ultimate goal 
is the resolution of the amphibology •! shot the man \^th a gun* in 
the sentence 'I shot the man with a gun, but if the man had had a gun 
too, he would have shot me first'.' Resolution requires the 
recognition by language nomalisation that 'if the man had had a g\xn 
too* implies the man did not have a gun so that 'with the gun' is 
observed to link vTith I and not man . A shorter range goal is the 
resolution of ambiguity in a single word and paraphrase recognition 
rela table to single words. 
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2 STRANDS OF SEMANTIC '"HEORr, 1960^19?2 

2.1 The Kotntion nnd Fanctionl nr: of a Co. r. pivtorised Dictionary; 

Linguistic research in the 1960*s has explicated the 
difficulties to be faced in computorised sc;,iiantics, bub has provided 
no model that overcones them. Katz and Fodor^ attempted to create 
one, but thoirs barely suffices to disambiguate words inasmuch as it 
fails to take into account their syntactic contexts. However, as a 
semantic theory Katz and Fodor's model may be considered the nucleus of 
research into computerised lexicography. In the realm of syntax, 
investigation mostly centres on transformational grammar. For a 
semantic model that covei's the problems brought up by the linguists, 
one must turn to the type of artificial intelligence developed by 
Schank and others r^t Stanford. 

* 

2.1.1 Katz and Fodor-^ envisaged the canponents of their dictionary 

as concepts independent of the operation of natural language. A fUli 
discussion of the constraints to T^hich a language, natural or 
documentary, is subject will be provided in chapter 3« At this point, 
it may be said that the components by being called concepts do not 
escape reference in terms of natural language. In fact, subsequent 
discussion will reveal that they function syntagma tically as 
adjectives and paradigmatically as antonyms. As a prelude, therefore, 
to a consideration of Katz and Eodor's dictionary, it would bo 
appropriate to examine antonym and distinctive feature analyses, upon 
irfiich the sotting up of it depends. 
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2.1.1.1 Tho ;inninG5 of dlstinctivo fcr.t*j.vG analysis may be tracod 
back to tho Cours de riTi^uls^tlQiuj G •ru Vnlo 'prcprirod in 1913 from the 
notes of Do Saxissure by his stud(;:;tj;i'"^ In it tho yaloui" of a word is 
claimod to bo derived frcm its association uith other x^ords, that of 
the English vrord sheep > for example, being different from that of the 
French word nouton, because it contracts with another word, ncmely, 
mtittoH t which refers to a live animal. Since 1913, the value of the 
English word has changed. Three words sheep t mutton and monton now 
correspond to the French word. Accordingly the relationship between 
thcni may bo stated forrAally by attaching to sheep , mutton, m out on and 
m out on tho respective sets of distinctive features, /+live, -fovine/, 
/-live, "tmeat, +ovine/ , /--live, +skin, +ovine/ and /^livo, +ovin0/, 
The analysis may be extended to other words such as £ijg,, portc < cow and 
beef, idiich may bo assigned the respective groups of features, /+live, 
+s^rine/, /-live, +swino/, /+live, +bovine/ , and /-live, +bovine/. 
Dictionary entries thus encoded offer explicit indices for caiipu- 
tational analysis. The effectiveness of sxich distinctive featur^es 
will depend upon each of thera's being assigned a unique meaning. 

In being a form of language nor^ialisation distinctive 
feature analysis vri.ll only incidentally represent words with the some 
categories as traditional grammar. In Prieto's^ example, the 
sentences •Ello lo rcgarde*, and •Elle la regazHJe*, le and la are 
respectively assigned the groups of features /-l-singular, -f^'efinite, 
3rd person, -Jmasculiiie or neuter/ and /-l-singular, ^definite, +3x^ 
person, -masculine or neuter/. As thoy differ \rf.th respect to a 
single feature, gender, in the above contexts, they constitxxte what 
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Prioto calls *»no??mes»». Since in Iha sentences •Elle rogaixle le 
cahier* and 'Ello reparclo la portc* tlio use of the wronr; rccnder of 
article can bo detected ,Tnd corrector! vrith absolute certainty, unl^k^ 
in the first tuo sentencos, the Render distinction is redundant. 
Correspondingly the sets of distinctive features for le .and la ar^ 
identical, both being /-f-singular/ /•< definite/ and /+3rd person/. Tl\e 
function of these articles is analo?- ous to that of the two phoneme^ 
/n/ and /ng/ in English. VJhile in -most environments they are 
distinctive, before the phoneme /K/ in such words as in coin e they 
not. In phonology vrfiat precedes /K/ is called an •'archiphoneme'*. 
Analogously the prefix archl is applicable to the articles le and 
which thereby constitute an ••archinobme". 

In the examples of distinctive feature analyses presented 
above the convention of plus and minus signs has been adopted to 
reveal the explicit indices vrith which computational procedure has ^o 
deal. In theoretical linguistics they are often not displayed but 
left to human imagination. In Prieto's^ actual example the notati^^ 
did not consist of ferns of the type /^lnasculine/ and /-masculine/ 
but of the type /masculine/ and /foninine/. Similarly Katz and Fo<^^t'^ 
contrast /anijnate/ vdth /inanimate/ rather than f+aLnimsite/ td.th 
/-annate/. For computation either a special table of antonyms or 
the plus and minus convention is necessary.. Since the latter woul^ he 
less complex to programme a computer with, it will be applied 
throughout the rest of this chapter. The plus and minus signs vAll^ he 
called •indicators* and that which follows them vdll be called 
•dcscriptives*. 



2.1.1,2 Antonyms m»iy bo divid'icl into two c.*jteRories.° One group 
which may be doscrib-^d .is "non-rr/idable'* covers antonyms like m arried 
ond sinrlr^ which do not admit of degree. The second, doscribable as 
"tradable*', unbraces /;ni:onyms like big. and which do admit of 

degree. Those resemblo conversives in that when one is replaced by 
the other in a sentence in conjunction Td.th a transfomational rule, 
a paraphrase is produced. Because of the conversive relationship 
between buy and sell, for example, the paraphrase 'Fred bought 
sanething from John* may be derived from 'John sold something to Fred' 
by inverting the relative sequence of the nouns. By a similar 
inversion, 'Fi^ed is smaller than John' may ha derived from 'John is 
bigger than Fred ' . 

Gradable antonyms are responsible for what Weinrei ch9 calls 
"impure linkinr", a type of sjTitag^natic relationship in which it is 
not anomalous for one noun to be qualified by two adjectives which 
are antonyms. In the sentence *A small elephant is bi^' the two 
adjectives are not incompatible, since the word small refers to 
elephant standards and big to other standards. Because bif? and small 
have this property, it vrould be difficult to encode thcii ;d.thin the 
plus and minus convention. 

AntonjiYiy along vrith paradicmatic relationships in general 
operates not so much betv:oen words as between given meanings of woi^is. 
As the French word libro , for example, has many different meanings, 
so it has many antonyms as follows i pi^isonnier , cantif , esclave , 
force , 00 cup o , r^nne and OTib.-iri'asse » In English the v/ord anumnl is 
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SOTe times the opposite of h\r;:nn and scr.iotines includes humans when it 
is the opposite of plant , Thc-se are vhat Duchs.cek^° calls palatial 
antonyms. The antonym relationship botKcon scfhc pairs of words 
applies only in certain idler* For cxa-iple, tort is the antonj-tn of 
^^^son only insofar as 'avoir raison* rnd 'donnGr raison* are antonyms 
of 'avoir tort» and Monner tort» respectively. These are called 
phraseological antonjmis. 

2.1.2 In Katz and Fodor's^^ marker theory distinctive features are 

organised not only into antonym pairs, but also into hie-rarchie;^. 
Vhile the number of features included in their representation of the 
French word canard would probably not be adequate for coiiputation in 
an actual experijnent, it will be adopted in this discussion to explain 
their theory. In the Larousr,e dictionary^^ the different meanings 
(alonff with the English translations, of thcni) of this word are as 
follows: m. ZOOL. duck; cannrd male , drake? c anar:? sauvage , wild 
duck. II FAM. nag, jade (chcval); squawk (false note); hoa^:, false* 
repoi*t (ou) news, canard (fausse nouvelle)j rag (journal )f lump of 
sugar dipped in brandy or coffee (sucre); m archer eonme un canard ♦ 
to waddle. (V. DANDIMER [SE]. ) In Katz and Fodor's tree a selection 
of them i£: organised as in figure 1. In this diagra."n round brackets 
represent distinctive features (or 'Vnarkors" as they are called) and 
square ones, ••distin^ruishers*', vrfiich denote that part of a word^s 
semantic content that is allegedly not necessary for computation. 
This issue vjill be taken up in chapter 

For conputational an?i lysis the above typo of tree would be 
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"canard" 




replaced by a ono-d5jnonsional fciv.ula as follovsj "canard" = 
-concrete (-sound [fausse nouvollt.?] or -fnoand [noto faucse]]) or 
+concrote (-aninate (-i'i;quare [noi-ceau do cuoro^ or -square [journal^) 
or +anij:mte [volatile]). In this foiwula square brackets have the 
6&me significance as they have in the above tree. The round brackets 
do not, but are in fact isomorpJiic with the branches of a tree. 

An example, Khich Kats and Fodor irould probably accept, of 
hox7 to select the correct contextiial meaning of canard may be seen in 
the analysis of the sentence *le canard respire % for vrhich an 
appropriate marker formula for the word respire is "Rospiro** - 
+concrete (+anli)iate [vivre^). To elininate the non-contextual 
mournings of cnnnrd the markers of each word are matched against each 
other, Uie first one in the forn;ula for respire « v^ich is 
Z+concrete/, is matched against the first one in that for canard ^ 
/-concrete/. Since they differ, the contents of the enclosed round 
brackets boj'ond /-concrete/ are ignored and analysis starts again 
after the second or. Since the markers for both vjords this time are 
identical, being /-hjoncrete/, the second marker for re spire s 
/+animate/, is located and searching is now limited in canard to the 
confines of the bracketed portion follo^dng /+concrete/. Since the ' 
marker for rerpiro does not match /-animate/ for cgnard » the markers 
within round brackets that follou this one are passed over and 
analysis proceeds after or, vjhereupon a match is found,* Since there 
are no fui^ther markers for either word, max^ker analysis ends and the 
distinguishor [volatile^ determined by the match, is extracted as an 
indication of the contextual meaning of _cnnard . That the requisite 
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distinrruir^hor is found at the end of the fomula in this example is 
purely coincidental. With a different arranf;ement of markers it might 
have been located in the middle. 

2,1,3 ^\hile fron tho discussion so far the utilitj*^ of Katz and 

Fodor's theory appears to depend upon its capacity to resolve 
ambiguity, other claims have been made for it. Postal^^ suggests 
that "the semantic component provides each sentence with a senantic 
i. {.firpretation in the foim of a set of readings and accovmts for the 
speaker's knowledge of the facts of meaning." Nida-^^ claims that Katz 
and Fodor's tree could handle the distinction between the central and 
peripheral meanings of a word, through the location of the former on 
the left branches and of the latter on the right. Ihese tv70 claims 
seem to be based on the appearance of the tree rather than on its 
actual functioninc: in computation. 

In light of Bolinger's^^ criticism of Katz and Fodor's 
theory, it appears that the small scale on vjhich they envisaged 
setting up trees would be insufficient. While it is extreme to claim 
that the necessity to add markers to one of Katz and Fodor's trees as 
it was tested on sample sentences invalidated their theory, his 
findings suggest that the content of marker trees will have to 
represent not dictionary but encyclopedia entries in order to be 
functional. 

Because of- the nature 'of encyclopedic knowledge the 
representation of it requires flexibility in the structure of the 
mnrker tree, \\hile Katz and Fodor found a sot of markers amenable to 
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hlorarchiCTl or»:anisatlc:i for the v.-orcl cnmml^^ r:uch a set is hard to 
find for the Lniriish nonn tnink ^ V/nilo a carinohensive notation for 
the word cnnnot bo detaiM5.nGd within the scops of this discussion, 
since it would require a lot of cmpiiical data, a suitable formula 
for the meaninrrs of trunk as a noun would appear to be of the 
folloidng typ^ci: "trunk»*^ +containor,, . .[box]; -timber [elephant's 
nose] or +ti:nber [portion of tree]. In this fotmula the departure 
frora PCatz and Fodor's or-^p.nisation of markers is ir^dicatcd by the 
replacement of parentheses by a seniicolon. This arrangement provides 
for a coniputational analysis of all the markers for the word "trunk". 
In section 2.1. ^,2 it v/ill be observed that further extensions of 
marker logic are appropriate for certain uords. 

After the emendations have been made, the question of 
sjrntafrnatic relationships regains. While marker thcoiy may be applied 
to immediate constituents, it is not amenable to Kords like stol in 
LukjanoT^'s Rusr^ir^n example (chapter 1, section 1.3-2.1.1) that 
require the scanning of the context beyond the sentence for dis- 
ambiguation. In a text \viicre ''He sat on a trunk* occurred, the 
establishment of a link botv/een trunk and a wo3?d in another sentence 
would require an independent approach. Such an approach comes within 
the province of the more sophisticated classification schemes, V7hich 
vdll be discussed in chapter 3» and xchich in turn come within the 
scope of a camputeriscd scmrjntics. Katis and Fodor*s tree when revised 
v;ould bo a useful strirting point for a cornputorisod dictionary. 

To orpinnise one accurately %t is necessary to distinguish 
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botweon tho core and peripheral meaning of a vcrd.-''^ The form or is 
wl,at distinguishes it frcTi another woixl. The core meaning of fork, 
for ex^^mple, consists of sc^i^^ntic coniponcnts- of the type physjc^il 
object, artifnot and used for oa^inrr, and its peripheral meaning of 
the components - having; a certain avor.-^rre size and not beinc nsed in 

cHltures. This distinction is relevant insofar as a semantic 
tree based on a vrord's core meaning is loss likely to need adjusting 
for each new sentence to which it is applied than one based on the 
peripheral meaning of a word. The use of the word £on, which may mean 
either a writing instrument or an enclosure for animals, may be 
considered in the sentence 'Tlie horse is in the pen*. It is 
theorotically possible to disambiguate £on by assigning to its first 
meaning the marker /+canpact/ and to its second meaning, /-compact/, 
which would likevjise apply to the word horse . \h±le these markers 
may disambiguate pen in the above sentence, there is little guarartee 
that they will be equally effective in tho craputational analysis of 
unknovm sentences. In the dictionary entiy for £on the attributes 
pertaining to each of its tv;o meanings would be indicated. Ink and 
cartridges, for example, would be specified as what £211 (the writing 
instrument) contains. In the entry for jgen (tho enclosure for 
animals) animals would be specified as being contained in it. That 
the latter meaning of £3n is the contextual one in the above sentence 
•Tho horse is in the pen* would be determined by matching the 
attribut.-iG pertaining to horse with those applying to pen . 

The core meaning of a word may be arrived at by examininr^ 
its figurative usage, since figures of speech are often not fomod 
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arbitrarily but on the basis of ?ome part of a word^s neaning hovover 

inapparcnt it may bo, ITiat ob?:tr>cle is a coro meaning of the word line 

is determined by a luetaphorical analopy v:ith the word fence. Because 

of this an&losy with fences in *He leapt over the fence • the use of 

line in the sentenea «Hr lez y "^, over the line' is not anomalous, By 

testing the word vilh -^va: ^tbc^ pj^taphorical analogies one may piece 

together the various szr-rhic :*.j.t contribute to the vrord's core 

meaning, A dictionary or. ^r.:' so constructed would be applicable to the 

translation of idions b.-scsi firyurative usage, such as those in 

English that involve anlT^cl ;'>-r,t^s to describe personality traits. Ihe 

translation of the idioTi *\^e is a rat", for example, would be 

accomplished by pinpointing through the word He the contextual core 

meaning of rat , namely, that it refers to an unpleasant person. Thus, 

where the tnrget lanp:uaf^e is one like Zuni, in ^Aich anima'' figures 

of speech are used to describe a person's physical rather than 

18 

psychological characteristics, the translation of rat would not be 
another anl-aal name, but whatever corresponds most closely to the 
core meaning of the Ehgllsh tem. Ihus the figurative use of rat is 
incorporated into computational analysis by making it an integral pai't 
of the structure of the dictionary cntiy, in which the requirements of 
translation are met with an independent distinguisher for the 
psychologiccil meaning of rat . This ti^eatment of it corresponds to 
Ilirschberg's^^ suggestion that "Un sens sera done une correspondence 
entre une dcnipnation dans une langue et une designation dans une 
autre, . . , 

2,1,^ The flexibility of natural language, wliich allows authors to 
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redefine v:oi'rfs and thereby cancel in so.ae c.-::es the lexical relation- 
ships that vrould othtnrlse occur between the , explicates the 
construction of a dictionary entry. In si unj^ophisticatcd lexicon 
tho antonjnn relation bct/:een frrurlqn and s lr^ver y, for cxfimple, would 
he represented by tha n^vkers, /-J-liberty/ and /-liberty/ or the 
equivalont. In George Oncell's novel, Nineteen Ei.Rhty-^Fonr , the 
motto 'Precdoa is slavery* vrtiile cryptic is not anomalous, since a 
penetrating analysis of the context will reveal the missing indices 
prcxninent in its explicit paraphrase •Freedom of the body is slavery 
of the mind*. While this degree of accuracy in the constiniction a 
lexical entry nay not be required for mechanical translation, it 
would be pertinent to foras of fact retrieval that imitate the human 
ccxnprehension of a text. 

The utility of a dictionary depends upon how a gjrsmmar is 
applied to it. VJiile the penetrating analysis of the above sentence 
will prob^jbly rc^nain vdthin the sphere of literary research, it is 
within the prcrent scope of computational linguistics to provide an 
explicit paraphrase of the type «+freedom of thing A is -freedom of 
thing B', v.^iich is r^ufficicnt to convey the graTimaticality if not the 
neaninj^ of the above sentence. Such conputation is undertaken by 
researchers in artificial intelligence and will be the topic of later 
discussion. 

21 

2, 1.^1,1 V/oinreich's clai*Ti th^t the relationships between the 
components of a sentence may also occur between tho elements of an 
encoded dictionary definition of a single word is valid for Katz. and 
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Fodor's markers. These are in fact ^.djectivos belonging to order 
classes and aro based upon a pragmatic syntax ciinilar to that of 
coordinate retrieval, vhich v,dll bo discussed in chapter 3> section 
3. 2, 1,2. Order classes are formed by the tyjyj of adjectives that 
occur in the phrase 'ten big young rien', VMle the English rules of 
granuTiar require the words to be in this sequence, the demands of 
comprehension do not. The interpretation of Vaen young ten big*, 
for example is unambiguous. On similar grounds markers of the type 
/+fe:nale, +off spring/ or /+off spring, +female/ for the word daughter 
are subject to only a single inter pret<iti on and constitute vriiat 
Vfeinreich calls a cluster. 

Groups of ordered narlcers which are grouped into constituent 

22 

structures are called "configurations*^ by Vfeinreich and •*do'w"ngraded'* 

23 

constructions by Loech, ^ VMle the study of syntactic relationships 
is often relegated to grammar, by their appearance in the definition 
of a word, they also confront the lexicographer. Katz and Fodor's 
theory doe not acccOTiodato ordered markers. The representation of 
employer ♦ for exa'viple, would be in terns of the markers /+human, 
+hiring/. Since this cluster \fould be equally appli cable to the word 
CTplcyrjo for which the most appropriate paraphrase is 'a worker who 
is hired by scmoone* in contrast to 'a parson \A\o hires someone* for 
c^plovor » the notation used in marker theory is overwhelmed. 

In Vfeini*oich*s notation the distinction betv/een the two 
words is made by the direction of the arrow. The formula /hTinian<~ 
hiring/ is the roprcsentaticn of the vrord CTployor and /human**-^ 
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hirinr:/ of employe this convention the marVrcrn human anu hirinf? 
denote the core of rieaning that the two v;ords have in common and the 
different arrows specify ho^r it is or.^.-inised diffcrontTy in each vord. 
For the representation of an utterance in a text the same type of 
notation would apply so that the phrase •overworked c^nployer', for 
example, would be assigned tho fomula /ovcrvrorked, himan^^hiring/ 
in which the adjective-noun relation between the words overworked 
and employer is designated as a relation between the components 
overworked and human , Weinreich's notation serves to emphasise that 
the differentiation made by some linguists between the representation 
of a text and that of a dictionary definition has more to do with 
keepinsc separate the linguistic disciplines of grammar and lexicography 
than \'Ath linguistic reality. 

In the assirnnent of markers to words, the results of 
morphological analysis, an adjunct to grammar, differ from those of 
lexicographical analysis, which has to do with the cemantic 
representation of both marked and unmarked categories. Tne four 
sentences *I counted the boys', •! counted the boy*, *I counted the 
crovjd* and 'The crowd is facing us^ may be considered. In the 
assigment of the marker /+plural/ to counted and boys and of 
/-plural/ to boy the tv;o approaches agree. In the representation of 
croKd, however, they conflict. Morphological analysis designates the 
word as /-plural/ on the basis of tho zero presence of a plural 
morpheme, v:hile lexicc-^rnphical analysis plnces it as /+plural/. 
Althou,;^h the latter analysis would take into account tho selectional 
restrictions accordirip; to which *I counted the boy' is anomalous and 
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•I counted the boys' is graTi^natical, tii- Torxier one cnbraces thr 
sentence »lhe crcvd is facinr: ns% in i.-hich crotjd, boin^ the subject 
rather than the object, is sio?:ular, Tho integration of the ti:o types 
of analysis has been include^ v.dthin trrnsformationr:! grammar, in 
which, according to Postal^^ "the sub-cccnponent of syntactic rules 
Khich cnimerates underlying phrase markers (for exa-aplc. Noun Phrase 
and Verb phrase) is itself divided into t\:o elements, one containing 
0irase stracture rules (for exsnple, sentence«^Noun Phrase + Verb 
Phrase, Nou!i Phrase—^Determiner + Noun) and the other containing a 
lexicon or dictionary of highly structured raorphene entries which are 
inserted into the structures eni^merated by the phrase structure 
rules'*. 

The weakness of marker theory* that it does not show how the 
senantic content of a word is organised in teras of its grammatical 
status, is avoided by the transformationalists. The sentences 'Pity 
excites tho boy* and *The boy excites pity* may be considered. In 
each tho meaning of excites is different, being paraphrasable in the 
first sentence by tho utterance 'stirs excitement in* and in the 
second, by *caU5es something to be excited (in soTieone)*, For the 
first meaning c f excite the marker format for the verb frighten 
proposed by Chomsky^^ and Postal,^® /+Verb, +[^Abstract2sub ject, 
+[]Animato]]Object/ would be appropiT.ate to disambiguate it from the 
second. In this typo of formula the unbracketed categories are 
syntactic ones, in v^iich the first indicates a word's part of speech, 
a verb i ! the case of cvxcite, and subsequent categories, its syntactic 
environment. The catonorios in square brackets designate the required 
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semantic content of a word that is to occur in a given part of the 
enviroment. In tho sentence «Pityj"i^bstiMct] excites the boy 
[AnljTiatol* the racsninr of excite- is deduced .to be • stirs exciter. cnt 
in* not just bcctiu^^e pity and boy belonr to tho appropriate semantic 
categories, but because they assune the correct syntactic roles. 

vath integration of marker logic with syntax, the categories 
in a dictionary entx^'^ are no longer analogous to order classes. The 
relative positions of -f ^/bstract j> subject, -f [Anlniate ] and Object, 
in the above forraula are important, since a different sequence such 
as the one in the formula Z+Yerb, H- [/inimat o]] subject, +[AbstractJ 
Object/ would designate the second meaning of excite , namely, 'causes 
somethinc: to be excited (in someone)*. Tnis type of notation, ^ich 
specifies the meaninr: of a word in terms of its environment, is the 
crux of a ccniputeriscc seniantics and will be discussed further in 
chapter 

2.1,^,2 V.hile a variety of sjTitacrnatic relationships other than the 
one between the subject, verb and object mi^jht be included in a 
dictionary cntiy, it is not within the scope of the present developrrient 
of linguistic theory to provide an exht'»ustivo list of thc^, A sample, 
ho\7ever, vxill suffice to emphasise the necessity of a more elaborate 
assemblage of indices than that provided by Katz and Fodor*s marker 
theory. 

One s.^TntrxP:matic relationship concerns verbs of mention, of 
\-7hich spo.ik in the sentence 'It is nonsense to speak of a kinfi; as made 
of plastic* is an exc'jmplo,'^^ V/ithin the framework of Katz and Fodor's 
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thoory the x;ords kln:^ and plr.5^tio vjoulfl be ::b:jtrnctcd fran the 
sentence end throurh their recp^rotive foir *iTac, "kinr:"*'= +anijnate 
fmonarch'] or -aninc-ile fchcss pioco"] nnd '*plristic" = -animatej the 
marker lo^ic outlined in section 2,1,2 ^:ould ensure that king 
meaning a chess piece would be stolected as the contextual meaning. 
Since the usually anorialous meaning is the one required in the 

environ.-nent of •It is nonsense o • marker logic irould vjork only 

if in this environnient the formula for Vrinr ivcre altered by a 
granmatical rule to •'kinc;'*^ -animaterncnarch]] or -ranimate[^chess 
piece]. Such an alteration might taVie place in mechanical procedure 
through the assignment to the v:ord nonsense of a symbol, Tjhich would 
be operative whenever the word sueffk introduced a noun phrase 
syntactically connected vdth nonsense . 

The application of mechanical procedure may be complicated 
by the absence of a verb of speakinrr. Such an omission is evident in 
the sentence 'Tnat stallion is a mare*, which - as a facetious 
remark - is not anoraalous. The missint^ indices may be observed in 
the parophrase, 'vrnat you called a stallion is a mare*. In the 
present state of computational linguistics, the mechanical detection 
of such facetious rer.arks i^ll have to be shelved, although a 
pragmatic nea£;uro may be adopted. Since in written works apparent 
contradictions usually occur on purpose, they miirht in default of any 
other an^^.lysis be treated as cases whore a vsrb of mention is implied, 
vhere thoy occur frequently. 

Tno scope of the amended marker notation may be extended to 
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cattos where threo synt^^cmaiic links arc involvccU The relationship 
represented in the above formulae by the opeivitor or is appropriate in 
the representation of tho anomalous use of the word s/id in the 
sentence 'John is as sad as the book ho rer.d yesterday* for which 
tho formula for sad is "sad'* - /-'aninate['cause cnotion] or +animate 
[have emotion]/. The or is the sanio exclusive one that featured in 
the one-dimensional notation for c anard in Katz and Fodor»s example 
(section 2,1. 2), In the above sentence the marker /-hanijnate/, is 
identified as representing the contextual meaning of the word sad, 
because the marker for John t /+animato/ which Vfeinreich^^ calls a 
transfer feature, is matched against the formula before the one for 
book, /-animate/. 

The or operator is not applicable to all cases of three 
syrita&'natic links. In the sentence 'John is heavier than this 
rock*,^^ the pcrammaticality of the use of •heavier* mipht, for. 
example, be conveyed through the fomula, "heavier = /+an4jnate, 
-animate/, v;her^? the ccxuma functions as the antonym of or. Both 
markers may be selected as appropriate for a given context, since 
i-rfiatevcr type of noun is qualified by this word, heavier has the same 
meaning. A borderline case is provided by the word take , Vhile 
generally its representation by two markers separated by the operator 
or is accurate, this type of representation does not take into account 
the occurrence of zeurjna, whereby tho use of take in the satirical 
sentence • Queen Anne .does saaetitnes counsel take and sometimes tea», 
for example, is pernissible. A possible formula for the v^ord take 
that takes into account this usage miftht be "take" = /-^zeuma/ or 
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-zeugma/ ^ where the double sl«':sli lines vrould indicate that the 
choice of operator depended on the oxtralinrruictic context of t&ke. 

Tho constraints of sjn^t.ictio usage effect the plus and minus 
indicators. Tho dichotomy between thtm is not suitable for tiie 
syntapnatic representation of all the relations that involve mutually 
exclusive markers. These ccxiie under logons' heading of incanpati- 
bility. According to his definition "the assertion of a sentence 
containing one of the ^crr.ls over which the relation holds can be shown 
to be understood as implicitly denying each u,"* the sentences formed by 
the substitution of any one of the other tems of the set in the 
context in Trfiioh the given term occurs," Katz and Fodor's indicators 
are suitable for expressing t^. ' L?tticn ship between g*ivtOiijnns (pclar 
systems) but not that between v^rds in a multiple taxonomic grouping 
like the colour systcr..^^ That the indicators do not effectively 
represent it may be observed in their lack of ability (however little 
required in practice) to detect such sentences as 'Red is green » and 
•Blue is green* as contradictory. 

Multiple taxonomic systems may be divided into two kinds, 
hierarchic and non-hierarchic. An example of a non-hierarchic 
system may be observed in the names of colours, for vrfiich instead of 
two indicators there would be several. For the words rod , jrreon 
and blx\qi for example, the markers might consist of tho respective 
combinations of indicator and descriptive, /Red Colour/, /Green 
Colour/ and /Blue Colour/. T!he anomalousness of the sentences in the 
previous paranrraph would be detected by the same marker logic as 
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before, since l?od» G reen and Blue opvpo*;o each othoi- as plus and minus 
do, but through a larger vocabulary of indicators. 

An example of a hierorchic system may bo observed in the 
relationship betvreen part ond vhole in body parts, amonp^ vhich the 
words man, aw and finrer^ ^ may be considered. This system deinands 
more changes in the marker code than the non-hierarchic one above, 
since not only are more indicators necessary but they need to bo 
hierarchically oi^iered. To express the hierarchy between the words 
man t arm and finr^er markers of the type /3 Body/, /2 Body/ and 
/I Body/ might be assigned respectively to them. The sentences 
•The man has an am* aiid »The arm has a finger • would be recognised 
as bein? more acceptable than 'The am has a man* by the fact that 
the subject of the sentence is assigned a higher nu^iber than the 
object in the first two. 

2-2 The Sern^ntj c Interpretation of Syntactic Structur e 

2#2.1 VJhile the nucleus for research on the construction of 

dictionary entries was provided by Katz and Fodor's^^ marker theorj*-, 
no theory'' of eqv:/^! v;ei^:ht has appeared as a mechanical model of how 
syntactic structure conveys infomation. A minimal requirement of 
such a model would be to illustrate how sentences consisting of 
different elements may be synonj'mous. Some of the research based on 
Chomsky •s39 transfomational grainmar, while an attempt to meet this 
requircncnt , focuses only on the least compLicated problems of 
pai^aphrase. Cato,r;orie5 used as a tool by which to apply rules for 
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reconnisini? paraphrases arc callod cacos or role indicators. 

One kind of paraphrase involves the roplacoment of one word 
by a certain other in conjunction with a reversal of word order, Ihe 
relationship betvjeon these tvjo words is said to bo a conversive one,^^ 
By such criteria the sentence 'John sells books to Mary* would be 
recognised as a paraphrase of 'Mary buys books from John* because of 
the interchanging of the wordd Mary and John and the conversive 
relationship between sells, , >to and buyr^q , ,froni , A convenient 
notation might be /Active barter/ for soils to and /Passive barter/ 
for buys^^ ,fro;7i , in \chich the non-capitalised item is a marker and the 
capitalised one is. a case. A transformational rule would recognise the 
above two sentences as paraphrases by means of the conversives, which 
match vrith respect to markers but differ with respect to cases, one 
beinq; Fgtssive and the other, Active , This type of notation is 
applicable to Chomsky's familiar active-passive transfomation, so 
that a sentence like ^JoPi strikes John* may bo recognised as the 
paraphrase of one like 'John is struck by Joe* throuPih the assifi^ment 
of /Active hit/ to strikes and /Passive hit/ to 'is struck by*. For 
the sentences in this example the setting up of the notation is aided 
by explicit indices. 

Cases'^''' are applicable to the types of paraphrase that do 
not involve a paradifimatic relationship like the conversive one above. 
The fact that the sentence *John ruined a table* may be paraphrased 
by 'V/aat John did to a Table was ruin it', but not the sentence 'John 
built a table* by *'>7aat. John did to a table was build it* may be 
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traced to the different semantic Cfltepopios to vliich ruin and bun. Id 
belong. In order that a verb nay fit into \'r^ slot after 'V^iat John 

did to -.he table wac: the existence oC the object of the verb 

must bo predicated as being pidor to tho action of the verb. The verb 
ruin but not build meets this requirement, Correspondinfily the 
category of ruin is specified as Affeetin?^ and that of build as 
Effectln^g;^ in which capitalisation designates these representations 
as casest 

In Nida^s ''object - event*' analysis, cases ^ 3 used to 
correlate viords of different pai'ts of speech and on different 
syntactic levels to detect paraphrases. One may consider phrases 
eonsistinr; of an adjective and noun that are synonymous vdth 
combinations of verb and adverb, in i-rfiich the adjective is isomorphic 
with the adverb and the noun with the verb. The three woixls in Nida*s 
sentence 'He works excellently ♦ and the three respective non-bracketed 
words in 'His work [is] excellent* that correspond to them are 
amenable to case analysis. Three cases in the foi^ula. Object-*^ 
Event <J— Abstract, in this order represent tho vrords of each of the 
above sentences. For computation, these cases V70uld have to be 
related to a surface structure grammar, which Nida does not do. The 
atomisation of parts of speech, in this case adjective, noun, verb 
and adverb, which Ear-Ki aimed for in his categorial grammar, 

might provide the requisite components for arriving at the formula 
throuf^^h step by step' procedure, 

FilLiiore's'^^ case system covers paraphrases that have to do 
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with transitive and intv<?n.sitive verbs, of ^hich tho word move is an 
example of both, A sjmtactic analy?:dc of tho scntonccs in his 
exomple, •llio rock moved •The wind moved the rock* and *I moved the 
rock vdth a stick*, vrould indicate rock as tho 5ub,ject in the first 
sentence and the object in the second and third. Since the conbina- 
tions of the v/oi^ds moved and rock bear the same meaning throughout, 
notwithstanding word order, the word rock in FilIi;iore»s notation is 
assigned the case. Object, The words stick and wind , since they refer 
to a force directly responsible for action, are given the case, 
Instrumont , Tho word I, xchich refers to a force indirectly 
responsible for action, is assigned the case. Agent « 

The interaction between the cases is indicated through 
Fillmore's folloKdng formulas — Object; (Instrument); (Agent), 
in vThich the brackets signify that the appearance in a sentence of 
the case in question is optional* The rule according to which woi^is 
belonging to a given case fit into a sentence is ls follows: if the 
Af^ent is not present, the Instrisnent is the subject i^nd if this case 
is not present the Ob.jcct is the subject of the sentence. Even with 
the indication of the surface structure to \^hich the cases relate, 
Fillmore's system is incomplete. In order to apply the case system 
the formula for each word must include markers. For example, the 
noun rock, since it cannot function as the Object of every verb, 
might be assigned the fonnula /Object j motion/, and the verb move 
might be assigned thfe semantic component /motion/. The fact that rock 
and move have tho same snmantic component, /motion/, and that rook is 
assigned the case. Object is the computational means by which rook is 



detected AS tho Ob.icct of movO t 

2.2.? Tho resources of theoretical linn:uistics from vhich. a 

conputcrisfid scn^ntics may be constructed peter out beyond the 
recoi^nition of the simplest cases of paraphr.ise, althour;h further 
lines of investigation have been postulated. Ballert,^^ for instance, 
claims that "The surface structuz*e of an utterance in which linguistic 
indices are explicitly expressed ic: clearly much closer to its LS 
[Logico-Senantic] structure representation than those of its para- 
phrases in v;hich linguistic indices do not occur, although they are 
somehow implied if the utterances are recognised as paraphrases**. 
The LS structure of an utterance is the formulation of its meaning in 
terms of explicit indices. The syntaj^natic difference between the two 
sentences »John is easy to please^ and 'John is earner to please* would 
be conveyed in LS structure through the paraphrases Mohn is easy for 
someone to please • and *John is eager to please someone v/hich 
contain the indices for and somecne . In computational linf?:uistics 
those techniques of transformational grammar have been applied in the 
construction of an interlingua. 

Another line of investigation is pursued by kMnreich'*'^ in 
his formulation of certain syntagmatic propez^ties of utterances. 
Tneso properties come under his heading of linking, vTithin tho scope 
of wiiich are included the sentences 'The wall is v;hite», 'The wall's 
whiteness is astonir;hing» and 'The wall is astonishingly x-rtiite* . 
Ihoy arc represented by his formulae, (w^ll, white), (wall, ^^^^f^^, 
astonishinr:) And (wall, ''^•"^'^onirhinr ^.;hito), in xjhich the adjectives 



represented in superscript arc considered to fom the most direct 
link with mil and T^he others, a less direct ono, Weinreich's 
notation conveys information thvit would be provided by immediate 
constituent nn.ilysis, in vhich the second sentence would bo said to 
differ from the third in having wallas and whiteness rather than 
aF^tonishin?^ and v:!iit e as i^nmediate constituents. The originality of 
V/ein2«eich»s contribution appears to consist mainly in his placing 
syntagmatic linking within the province of semantics. 

In the analysis of how infomation is convejj^ed via syntactic 
structure linguists have focussed their attention upon the task of 
representing the constituents of sentences themselves and have tended 
to shy c-^v/ay from incorporating encyclopedic knowled^re. In Nida's'^7 
teminology, the first task concerns constructions from which the 
moaning of a vrfiole utterance can be derived from the meaning of its 
parts, scmantically endocentrie ones, and the second task, construc- 
tions from which the meaning of the whole utterance cannot be so 
derived, se.Tiantically exocentrie ones. Noel's utterance •liberation 
du proletariat' J v?hich he considers to be an example of "metaphorical 
logic*^ is an cxocentrio construction, since fran the dictionaiy 
meaning of the words libera tion, du and proletariat it is not possible 
to derive the paraphrase 'otablissctient d'un nouvol ordre econcmique 
favorable aux travaillcurs • . 

The reliance of authors on extralinguistic context makr 
paraphrase recognition in the case of exocentrie constructions 
difficult. Because of such reliance the dictionary nic .ning of a word 



in a cliven context mny bo ronderod rt^dr.ndant. For example, vhile the 
two sentences •A pai^adif*^ i» a s^et of subii hi tu table fonns' and 'By a 
pnr/idifT'n wc- understand a .sot of subi:t5,tiitahV> fonas^'^^ differ super- 
ficially vith rccpcct to tho lexical n-.eaninn; of undor.stnnd , they are 
nonetheless paraphrases of each other, because this uord conveys no 
information in its context and effectively constitutes a function 
word. In fact retrieval it would be desii^able to h^ive such a word 
rejected as a keyi^'oi'd for this reason. 

The question of encyclopedic knowledge is raised by 
Todorov^^ in his discussion of how to categorise the different 
connotations of the French word i7ian?er in the utterances •manger la 
soupe* and 'manner une plnmie'. In the first, mangrcr refers to the 
act of eatinj? by means of a spoon and in the second, by moms of one's 
hands, Todorov nails these differences ones of reference rather than 
ones of meaninp:. In mechanical translation the conmon core of meaning 
^hat the tv:o uses of manfrer' have in these utterances, namely the act. 
of putting sancthinp; in one^s mouth and digesting; it, would justify 
+ ^^^ating the vjord as v:oinR unambiguous, if tho target language were a 
language like English, in which the word eat has similar connotations. 
In sophisticated forTi of fact retrieval involving the treatment of 
relationships beyond the sentence level by means of a classification 
schfcne the connotations of m-anpor vjould probably be differentiated. 

Vhat L^ffaiy ."lalls ox-peri ential validity cones under the 
headinp; of encyclopedic knowledge • The two sentences •I talked xd.th 
you* and will talk "ilth you' may bo canpared with •! artreed with 



you* and •! will aK^ee with you*, BoUv.s^n the firi;t t'yo sentences 
thore is v. ternporal distinction and the vxvrd \rill r-cTcvG to tho 
future. In tho socond group of sontencc^-; tlia futura tonse usually 
oxpressod by vrill does not have oxperientic-.l vnlidity, for, whilo one 
may sot a date as to vhon ono \rill talk in hirnan culture, one doos 
not usually prophesy one's agrccient with co.vioone. 

Tho representation of encyclopedic knowledf<o calls for 
greater spocification than the type of cases iTiontionod so far provide. 
Fillmore 's^S cases indicate the cctnantic relationships b&tueen words 
only insofar as these relationships show how a sentence of one kind 
of syntactic structure may be paraphrased by a sentence of another 
kind. Schauk^^ appropriately pointed out that of FillTiore's sentences 
(section 2.2.1) 'The vrf.nd moved the i*ock» and ♦! moved the rock with 
a stick', the firso differed froni tho socond in that the act of 
blovAng vas implied, v;hich RLllnoro's case system for I and wind did 
not indicate. For KdXz and Fodor's^^ sontrncos 'Should \;o take juniqr 
back to the zoo?», •Should we take the lion back to the zoo?» and 
•Should wo take tho bu:; back to the zoo?*, semantic components such 
as AniTnato and Ihr.inn for the object of each sentence would serve to 
indicate thot the word tnlcq has different implications in each 
sentence, but not how. In order to register certain information, 
namely that lions are kept in cages, buses are ridden on and that 
hivnans visit zoos, tho dictionary entry for triko back would have to 
incorporate on cyclopedic knowledge. 

2.2.3 Tho notations of C^aillinn^s^^ mechanical memory and 
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Schank^s^^ mcjchanicrnl intollirrcnco r^rc able to acccmniodato both 
linp:uistic and enc.yclop'''dLC knowledrte, since they repre£;cnt a 
difieront typo of relationijhip froXi that which v^ts the predominant 
concern of theoreticcil roGearch, This difference may be brou.9;ht out 
by the explanation of certain teminolop:y. In linguistics, relation- 
ships between vords in a text are said to be ^•sjaitai^atic'* or 
"synthetic'' and those between dictionary'' entries, "paradirpiiatic" or 
"a'^-ilytic'*. Examples of the latter are the relationship betv/een 
fixes and tooth , which constitute part of the definition 'Someone who 
^ixes teeth* of the word dentist » and the one between rich and poor , 

they occur as dictionary entries. Synthetic relationships are 
those that occur botvjeen words in a text and which do not form part 
of any dictloni.ry entry. For the purpose of explication it would be 
convenient to amend the above terminolofry so that l!he term "analytic'* 
refers to words that are part of a dictionai^ definition and the term 
"paradirmatic" to relationships betv;een dictionary entrii^s, 

,2,3»1 'H'ie empirical approaches of Schank and Quillian have to do 
Mjxth analjM:,ic relationships, whereas theoretical linguistic research 
is concerned with paradirpnatic relationships, Quillian 's^*^ memory 
consists of nodes connected by different kinds of links. Each node 
represents one of the meanings of a lexical word and each link 
desir:nates a function v/ord, which conveys syntactic relationships 
between lexical Vrords, Dictionary entries are represented by ''type 
nodes" and the lexical words that arc part of their definition, by 
"token nodes" V The word pl--^ nt , for example, would be assirrncd three 
type nodes, PT.A ^jT 1 ^ PLANT 2 and PLANT 3 and their respective meaninrrs 
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'botanical orpanicu * , •fix sa^iothing fir.nly^ and •industrial complex', 
would be encoded into token nodes and linkf>' of tho typo provided in 
figure 2, In this model tho full moaninrc of. a word is derived from 
an exhaustive tracing of the dofinitioru? of its token nodes. A word 
so traced is cnllcd a patriarch word by Qailli&n, In the above 
diagram Plr.nt 1 would be a patriarch word if its full moaning were 
searched for by tracing the dotted arrows which lead from each of the 
token nodes, 

Vhile the mechanical memory has mainly to do with analytic 

relations, paradigmatic ones could be represented implicitly by a 

careful fomulation of the definitions for each word. Ceccato's-^^ 

model of semantic relationships represents paradigmatic relationships, 

such as those of hjrponymy, antonymy and conversiveness hy a diagram 

consisting; of v^ords linked by correspondingly numbered arrows. In his 

notation the hypom'my relationship between /jiimal and Dog would be 

03 

conveyed in the form •Dog— -^/Jiimal* , in vMch the species occurs 
before the arrow and the genus after it and whore 03 designates the 
hyponymy relationship. In such a diagraiti tho tv/o utterances pear tree 
and tree would not be included, since the hyi:>ony)ny relationship is 
convoyed bj*" explicit indices and thereby is within the province of 
grammar. Nonetheless the same relationship holds betvjeen these 
utterances that occurs betvjeen Do^^ and Animal . Tree is the genus and 
p ear tree , the species. In the memory it would not be diffic'lt to 
incorporate the hypor»yiny relationship if do r; were assigned explicit 
indicris in the form of the phrase canine animal , vriiich would then be 
encoded. The analory between the first two utterances, pear tree and 
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dOt3, and the crcond tvo, troo and ^iiItstI voiild not be obscured throufch 
thoir separation into tv;o lininiii^tic disciplines, ivhich would happen in 
Cecccito*s model, unless it incorporr.tcd an indefinitely Itnrrrc number of 
phrases i-nd possibly of sontono^.s. 

2, 2.3.2 In Schank's^^ mechanical intelligonce the notation consists 
of terns vhich, for convenience, he represents irith English words 
rather than with code numbers, and of relationship items portrayed in 
the fom of different kinds :)f arrows. A sentence such as •The big 
man steals the red book* x-.c.-lo. receive the follo',cing two^damensional 
representation J 

Tnanr.assrns;;i;:y steals ^ ■book 

big red 
Th'- docblo arrow represents wliat is called a tv:o«vjay dependency 
roLationship and the single arrows, one-way dependency relationships. 
Tnis representation parallels iirraediate constituent analysis insofar 
as the up;:a]t*d sin?rle arrows designate linked word;!, the horizontal 
single arrow denotes linked elements of vrfiich one is a phrase and the 
double arrox^ designates the linking of phrases, in this case, ^big 
man* and •steals the red book*. This type of notation, which appears 
again in V/llk's approach (chapter 4), is more than an I.C. analysis 
in that his components are words that have undergone lanf^uago 
normalisation. Those -w-y-sir are parsed according to their deep 
structure sirnificance so that the representation of the above 
diarrivim, in v:hich each tor*: is replaced by its concept class, is as 
follows i 
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Picture Producer (pp^-— -^ACT— r-p 

T / ■ - T 

Picturo Aidor (PA> PA 
The connection hctwcon a sentence .-..id this tjqjo of conceptual 
representation is provided by "English realisation rules" of the 
follov;in,'?; types 

PP<:==u^VACT - Noun Verb, PP<; {^CT = Adjective / - Noun. 

The above representation is thereby observed to con-espond i^th the 
parse, Adjective / Noun Verb ^^^j Adjective Noun, of the 

sentence 'The big nan steals the I'ed book*. 

In the dictionary the above types of nomalised words are 

gi^ouped into scr^.antic categories. The representation for one meaning 

of the word b^ll, for example, would be as follows i 

PA PP fic^ 

has texture located any^vhere rolls 

has colour for anyone ' bounces 

has be;3uty belongs to people hits 

An entiy so structured is equipped for the resolution of multimerning. 
The concept classes, PA, PP, and ACT rr-present the semantic attributes 
with v;hich adjectives, nouns and verbs resperttively in ""he environment 

T?.ust be compatible, if this meaning of ball is to be selected 
f'^r a p;iven context. E>:aniplcs of words that neet these criteria are 
to be found in the ser.tenc* •The ball is red', 'The ball is John's* 
and 'Tne ball is rollinr:', in vjhich ball is understood to refer to a 
round object rather than to dancin.q;. In order to effectively jud^re 
the above t^,rpo of file, the results of lar^Tie scale applications vdll 
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have to be ex.-^uined, Hoirovcr, the catc-;::oiuoG in it appear to be based 
on intuition and it may suspected that the notation for them m^y 
rescnble that of Booth's concept nUub^rs. 

like the nodes and links in Qaillian^s^^ memory, the 
concepts and relationship Itras convey paradifp^^atic relationships 
implicitly thi-ough analytic ones. Tne relationship between cat and 
-Q^^^IQ would be conveyed through the representation of edible as 
follows^ one$=r>3at^'thing» (N). Wiereas in Quillian's menoiy the 
senantic content shared by a group of vjords was elicited by tracing 
the nodes and links, in Schank's^"'' dictionary the ineanings are 
factored into particular files. For exa-aple, although some of the 
concepts appropriate to the word ball v^ould be foiind in its entry, 
others would be found in a "physical object" file. Files would list 
encyclopc?dic as well as linguistic inforaation. For example, to 
understand the sentence »Did ICixon run for President in 196^?* a 
machine would search the experience files and interpret President 
as 'President of the U.S,» 

Wiile Schank^s mechanical intelligence accommodated 
encj'^clopodie knowledge, research in computational linguistics was 
generally orientated tov.^ards the traditional dictionary rather than 
the encyclopedia. The makers of classification schemes, who vjero 
often concerned vn.th specific subject areas, had a different 
perspective. For example, from a chemist's point of view a suitable 
classification of a document entitled 'The Conversion of V/ater into 
Hydrogen and Qxy^cn by Hydrolysis' would probably involve for t\a 
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uttcrc-jnces Vfitov ^ hvdrc^ m ;jvl^ ox^/rc-n r.)y.l hycIro1v; v3 the categories, 
stfl7-tAn^ in^itor-i. -l.^ produc?. ;ind TiTo ^.L^sn recpoctivoly, vjhich would 
contribute to the ctracturi:;^ oT encyclopedic knov:lcd;:e. For cource 
matoriril on how to orpcaniGO it, it vould be appropriate to c>:i.»!nine 
the classification schemes. 
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3 CONTia:.:iTIO:CS OF Till!: CLASSIl ICATICW SCiiL 1£3 

3 . 1 GenervjJL Clr^.ssifir^ lionG 

3. 1,1 As was demonstrated in the pre * cus chapter, a classification 

of encyclopedic knowledge and the means of sho-^rLnc how to encode 
utterances of natural language into it uould be essential to a 
computerised semantics. Various classifications exist and are of two 
kinds, [general and special. The special classifications, each of 
vhich embraces only a small part of the spectrum of knowledge but in 
great detail, were created to serve the particular needs of researchers 
in a given field and wore significant in their abandoa^ent of the 
principles on v^iicV^ the general schemes x^cre based. The general 
classifications, i^hich embrace the whole spectrum of knowledge, were 
created for use in library science, in which the encoding of docments 
to locate them on the shelves, is performed by htcnan intuition. The 
notation of the general classifications is corref;pondingly not useful 
for the mechanical encoding of utterancers. Nonetheless, all these 
classifications provide source material and even the notation offers 
lessons, albeit negative, for computational linguistics. 

In a classification scheme there are usually three basic 
canponentnj schedules, which are lists of groups of symbols that can 
be svii :cd to the main notation, the general tables, \:h±ch provide 
encyclopodic data on. the lexical words of a language in the form of a 
codo-Englir^h dictio:. ry and thereby material from which to const.nict 
idioglossaries, and an index, which i€ an English-code dictionary. 
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Because the scope of a clrisc;ificaticn schc*:ae iTicluJos encyclopedic 
data> the assess-nent of it uill bo different fx'ona that of the usual 
dictionary. Brouii^ in his 5ul?jcct. CLassifi cation irho aimed to 
classify the words in his index as concrote unities nisced this point. 
For the woi^d efrq;s , for example, ho provided only one code number, 
F601, The postulated unity, hovrever, lies solely in the fact that 
the word refers to the hard-shellod reproductive body of a fovrl or 
bird, Efrrrs fits into many subject areas. In Dawoy's^ Decimal 
Classification each application of the word is represented by a code 
number. For the context of nutrition, for example, epcgs is 
represented by the number 612,39283 and for that of ornithology by 

Anong general classifications the following will be treated: 
the Librarj*- of Congress (LC), the Dciwey Decimal (DC), the Universal 
Decimal (UDC), the Bibliographic (BC), the Subject (SC) and the 
Colon (CC) classifications, UDC^ and Dc'"**, the first of which is 
derived fron the second, may be grouped together. In both, knowledge 
is fitted into the follovring decimal framcv7orkx COO Kenoralia , 
100 ph5-lonorh y, 200 ralirriont 300 social sciences ^ 400 lo nfruafte , 
500 scienc e^ 600 tcchnolofry , 700 fine r;rts, 800 literature and 
900 history , trnvel , bioj^rr^phy . Successive subdivisions of the^e 
topics are made b2'" inserting dif^its botvreen one And nine in the tens 
and units columns and thereafter beyond the decimal point. For 
example, tochnolory is 600, enrinocrinp: is 620, mechrinical en^Tineerin? 
is 621 and ri/)chino tool.q is 621,9. Viiile the decimal point is not 
essential, it is inserted tc divide up tho digits for the human eye, 
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xl::- zcroci f-l ond ci :• nr.riber, t.! "jch aro not uscessr.ry, are 
olxT.inated in UDC, 

Viliile the above ... ation is cccxpaat, the objective napping 
of knovlodr;o ^.Qc:ns to hivo been sacrificed to f^cccmmodate it. The 
roacon for Grouping ncd>oiro 610, r rr3 cnltairo 63O and buildinp: 69O 
undor the crotch-all tc:i^., tcchnolo- :v 6OO, apponrs to be that there 
are not enourh dif^its in the decimal systeni to accemodato nore main 
clashes . Ranp;athan«s^ octave device provided a yneans of ovorcoraing 
this limitation. In his notation a di/jit preceded by any nmber of 
nines is considered to belong to the s^Lnie subdivision as one preceded 
by no nines at all. The nu^ibers, 1, 2, 8, 91 > 92, 93 f 98, 991 and 
992, u^ould represent nain classes. As the number of nines that can 
precede a digit is infinite, so is the number of possible tei^as in a 
given subdivision, 

Kiilo D:: and UDC share a corx-T,on notation they differ in the 
means of building flexibility into it. In DC flexibility is provided 
by *-divido-liko*' instructions, through vrhich the canie strlJig of 
dif>its i^.ay be sor-inonted differently to convey different information, 
crocs-refsranccs being made from one part of the general tables to 
another. An cx/:uple of the use of tho instructions nay be seen in 
the classification of Proverb s 398,9 according to the language in 
which they are written. V/aile digits may be added in the way 
descrVbsd previou:5ly to this number to denote the subdivisions of 
Provcrbn, such an approach would fail to utilise the roady*made 
catcgorisatio.n of languages set up for another subject, that of 
linguirtics. The subdivisions of lanfjuago are i^jprosontod by the 
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n\£nb£?rs ^20 to ^490, Theroforo, to Gr/.jMvide P ■ .. v.-rbs tho instruction 
•divide like ^20 to U90^ is providt-d. i-hich nornc that tho nuraber foi 
£ pai-ticular Innruage is attached to tho one for VrovcrhFj but :^thout 
the initial The ni-rr.b^^r for Indo-^T -r ni-n Frov^jb r; is 39B,9911f 
vhich is a conflation of Proverbs 393 • 9 and Indo-^Ir.-.nian lanruane 
^91* 1* Viiile in the general tables tho *^divide-like" instruction is 
applied to increase the inventory of terns vjith tlic same notation, 
the device could be extended indefinitelj*- to represent synthetic 
relations as well. 

The capacity to represent 5311 th otic relations is provided 
in UD3 by auxiliary spibols. The most praninent of these is the 
colon, v^iich has the s^ne meaning that the word and has in English, 
This sj'^Tibol allows strings of digits to be regrouped without a chance 
in their meaning. For example^, 66 chc-aical technology and 653 
industrial manarrcmcnt can be synthesised into either 66|653 or 653:66 
to denote •manar^ement in tho chemical industiy't The infixing of one 
string inside another by means of square brackets provides for 
further flexibility of representation. A grouping of given documents 
by numerical order might occur as follows j 620,l9lt 669»3 
•discoloration of copper*, 620.191r 669#^ ^discoloration of lead*, 
620,192: 669.3 •swelling of coppor*, or as follows: 620,19[669.33l 
•discoloration of copper,* 620, 19f669. 33^ 'swelling of copper*, 
620,19[669.^]1 •discoloration of load*. The first proupinp: collects 
docuTients on discoloration and the second, those on the defects of 
copper. Because of its provii;ion for alternative ways of grouping 
documents and of its capacity to convey infonnation about them in very 
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groat detail, UI>: is not just & library cL-i;sification but also an 
inJoraation retrieval systc:m. 

Since UDC is tied uo the enuTierative fr::jTic;7ork of DC, tho 
provision for altern-ttive grouping is not consistent. Tnis fact is 
underlined by thr? apparent redundancy of certain dif^its in tho 
notation. In 633. 15 - 272. 6j 632.937 'Biolosical Control of Injuries 
caused by Locust Pest to Corn', vhich is synthesised from 633.15 - 
272.6 'In juries caused by Locust pest to Com* and 632.937 'Crop 
Protection by BioloKiccl Cc^x^trol'^^ the digits 6 and 3 on both sides 
of tho colon signify twice over that the docunent Coynes under the 
categoi-y of acriculture* Tl.is apparent redundancy is unavoidable, 
since these dibits are needed as null ter:?iSt as the number 2.937 by 
iui^alf has a different neanin?: fron 2.937 in the string 632.937. 

Bibliographic Classification (BC) and Subject Classification 
(SC) provide mrnns for representing sjmthotic relations. In BC the 
coTjna sei'ves the svjno function as the colon in UDC^. The title 
•Protection of Corn against Locust Pests in India in I967S for 
exsmplo, would be repi^esented by the corr.ponents UA Agriculture , QT 
Corn^ JQDL Jy:>c\'nt Post 9 H Prc-tncti o nt q Indn ^. and U R ecent Period in 
tho foiv.iula UAQT, JCDL, Hq, U.^ In SC, the bour 'nry between one 
strinp; and the next is inarked by a chann^ frcn a numerical to an 
alph?ibotic bnf;e. A title like »Uncmplo:>nnent in the Shipbuilding^ 
Industry* would bo roprcsentcd by either B65OLIIS or L118B650,9 
Letters roprosont main classes. K, for example, sir^nifios philosophy 
and relif^ion, and n;roups of digits (000 to 999) represent the sub- 
divis'^onn of those cla:;i;os. For instance, K951 signifies Catho lic 




i\postolio Church And i:952, Chrir.t.icn Dii^^ \vc:^r Society . To inter- 
calate new subjects, rrxore dirrits *^r^ added* 

The most inflexible of the general classifications 'i^th 
respect to notation is LC,^° vjhich is based upon an actual collection 
of documents rather than on a theoretical r;:c.p of knowledge. Its uiain 
classes are as follovrsf A - Gcnoral Works j Poligrraphy ; B - 
Philosorhy ^ Relirrion ; C - Higtory ^ Auxiliary Sciences ; D - History 
and Topography ; E and F - Ani erica ; G - Georrrfinhy ^ Anthropology . Sport ; 
II - Social Sciences; J - Political Science ; K - Iaw; L - Education ; 
M - Music ; M - Fine Arts ; P - Lanrruage and Literature ; Q - Science, 
General ; R - Medicine; S - A g:ri culture ^ Plant and Ani-nnl Industry ; 
T • Tochnolofry ; U - Military Joience ; V - Naval Science; 2 - Biblio- 
^r;}vh v and Library Science. Subdivisions are made by means of a 
second letter and four digits. Wnile gaps between numbers provide 
for expansion, the notation cannot accomodate synthetic relations. 
Because of its lack of cross-referencing, LC may be considered as a 
cooixlinatcd series of speci il classifications, each main class being 
independent of the others. For exa.mple, in Statistics, periodicals 
and congresses are represented as KAl and HA 9 to 11 reiipccrtively, 
vhile in Econc-iic Theory they are respectively H3 1 to 9 and HB 21 to 
29- 

In the other classifications the inflexibility of notation 
is relieved by schedules. In SC,^ numbers bolonp:ing to a schedule 
are distinguishcid by a procedin<?: point. Tnus 1229.10 'History of 
I,andscape Gardening* may be recognised as a concatenation of 1229 



•Landscape Gardening* nnd .10 'History for fener.il use in cll 
^ clas.ses'. In DC,"*-^ a preceding zero dibtinp:uiihc5 a niinber that 
belongs to a schedule. For oxa^-nplc, 6l4,05 .'poricdical on public > 
health* is interpreted as 6l^. "public health* and 05 •periodical*, 
vhile :^05 •periodical on scien'-o' is segmented into 5(00) science 
and 05, In BC> v;here the nr.in notatic-al base is alphabetic, the 
distinction is made by means of numbers so that a concatenation of 
BOV and 3 Histor>% for exiunplc, provides BCV3 'Kistoiy of the BBC*. 
In UDC,^5 punctuation is used. For example, 62^30a is *civil 
engineering written in German* and 624(^7) is 'civil engineering in 
the U.S.S.R,*. 

3«1«2 The enumerative framevjork of the general classifications 

provides for a very econonical representation n-r tnfomation. In DC 
the meaning of a digit is detorrained not only the column in which 
it occurs, but also by \Aat digit it follows. For example, 3 
indicates a subdivision of tcchnolopcv'' after 6 in 63 agriculture but 
one * Social Sclerces after 3 in 33 ec on amies . Such econony is 
attained at the expense of the accommodation of syithetic relations, 
which, as was obsexved in the last section, are only indirectly 
represented. Fran the point cf view of computation this dravjback is 
inconvenient, 

Ihe notations of the general , ossification schemes were 
created on the principle that any one term v/as the species of only 
one genus as depicted by the tree of knowledge analogy. Sr-^^^ - view 
does not take into account the complexity of lexical organisation. 
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Terms are liho the el<: ionts of the i.;aihoi!.',Lical equation, .ib + be + 
ad, \Aiich r.-r,y be reprc^r.cntcd as a(b + d) -t be or b(a + c) + ad, in 
that they can bo grouped accordii:,';^ to moro th^.n one category. For 
oxcL^iple, I.- ;ib nay be plc-iccd with rl'yop under the/ category ovine or 
uith puppy and pony undor the catcf^ory ypv.x )r,m 

The consoqucncos of rejecting the tree of knovyledgo analogy 

may be observed in an fsdaptation of Sharp's"'-^ exa-nple, in which the 

cotugo..ries into x^ich tcins are placed are explicitly indicated^ In 

a clas2!?ification of military science a tem like militar y aeroplanes 

600 wight be subdivided into throe other terms fif^hters 6IO, bombers 

620 and Trangnor t pLmes 63O. Each of these wight in turn be sub* 

divided. Thus fin;htcrg 6IO would embrace the terras 'single-engined 

fighters* 6II and similarly bonbers 620, the tems •single-ongined 

bonbers* 621, ^txiin-cngined bombers* 622 and 'threo-ongined bombers* 

» 

623. •Transport planes* 63O Kould bo subdivided into 'twin-engined 
transport planes* 632 and *throe-engined transport planes* 633« In 
this notation se-.nantic categories are specifically designated by 
coluTins, The tens column represents the genus of purpose, and the 
units column, that of the nur.ibor of engines, ViiGn a particular 
category is not fjpplicablo, the others nay bo preserved by inserting 
zero between other digits as a null toinn. Thus 'three-engincd 
military aeroplanes* would be 603« 

In the above notation tho maximum capacity of representa- 
tion of a throe digit nu^nbor is twenty-seven tems (9+9+9), v:hereas 
in DC tho capacity is seven hxindred apd tvranty-nine (S^9x9)« If the 
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octave devico is used, the dilTerenco increases. Hovover, Sharp^s 
notation is moro attractive from the point of vicv: of mechanical 
retrieval, since machine may more z*oadily retrieve a term by identi-, 
fying a digit in a specified colmn than one as it occurs after 
certain other digits, 

Drai.dng the principle of flexibility to its logical 
conclusion, the makers of the special classifications create notations 
of unoi*dcred digits - or groups of digits (in -which the octave device 
is applied). The terms and their corresponding symbols for the 
classification of military aeroplanes vrould be of the type, siiiprle-. 
engincd 1, tTrin>-en^ined 2, three-enrrined 3f firhting bombing 5, 
transporting 6, and military aeroplanes 7 and a document entitled, 
for example, *Thrce-engined Fighters' or 'Three-engijied Fighting 
Military Aeroplanes' would be represented by the number 3^7 or 7^3 or 
any combination of these digits. This tj^po of notation is adopted by 
Perry, Kent and Berry in their generic encoding, in vhich groups of 
letters replace the above tj^De of digits. The letter conbinations 
from left to right represent the classes and subclasses into ;%+iich a 
term fits. For example, animal is NA, mammal is NA MA, dog is NA MA 
DO and terrier is NA MA DO TE. This systematic sequence of letter 
combinations makes for human convenience and speed of computation, 
althoup:h no anibir.uity would result from any other sequence. The pros 
and cons of such a notation vrill be discussed further in section 2 
under the hcadinpr of Coordinate Retrieval, 

The exception among general, classifications is Rangathan 's^^ 
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Colon Classification in that is not cnuTiorativo, Rangathan 
dcsinnatod his categories (or '^r.cotrs" as ho calls them) of v^aich 
there are five, explicitly as follo:;s: Perf:nn nliiy [P], K.:;tter [M]], 
Energy [E]> S pj cc [S], Tc^ [T], which appear in this ordor in the 
representation of docuiicnt titles, Mr-.tter denotes materials, Energy 
indicates an operation, process or problem to be solved, Sp^co and 
Tijne denote geographical and chronological subdivisions respectively 
and Personality seens to include miscellaneous infortnation. In the 
representation of a title Personality is follov;ed by a conraa, Matter 
by a ser:icolon, Bierrry by a colon. Space by a full-stop and Tirn e by 
an apostrophe. Encyclopedic inforaation about the words that may 
appear in titles is incorporated into the syntagmatic framework 
described above by means of an index, in which the five catee:ories 
are distinguished by square brackets. For example, in the entry 
•lending 2 [E],62 X [P]f62 [E] 1», "lending" is interpreted as 
belonging to class 2 library science, inhere it is denoted by the 
number 62 in the Etierpry facet, and to class X Economics, Trjhere it is 
denote J by 62 in the Personality facet and by 1 in Biergy. 

Various pi*ocedures govern the encoding of a title, In the 
first, it is parsed with the aid of an index, A document entitled 
•Spraying Instrument and Chemicals to Mitigate the Vii*ulence of the 
Injury to !thc Stem of the Rice Plant during the 1967 Dry Period in 
the Cauvcry Delta in Madras' vjould be analysed into components as 
follows: Aftriculture. Rice Plant [IPl]- Stem [1P2]. Injury [iMl]. 
Virulence [LM2], Mitigation [IE]. Chemical [2l>'il], Application [2E]. 
Spraying instrument [3Ml]. Madras [SI], Cauvery Delta [S2]. 196? 
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[Tl]. Dry Tcr±od [T2]. In this notntion tho ninibcr before the 
letter denotes which ro^-^nd n, word of a particular cate/rory belongs 
to and the nur.ber after the letter denotes its level, A round of 
facets is a clause consisting of Personality , Knitter and Enor^^, any 
of which may be absent in ciny particular title, V.here two or more 
facets of tho scene type, tv:o Personalities for example, occur in the 
same round, they are said to belong to different levels and are 
correspondingly assigned different nmbers. After all the rounds 
have b3en represented, tlie Space and Tnme facets are inserted. When 
a title has thus been parsed, numbers replace wor-ds to provide 'a 
title in focal numbers', vjhich in this example is as follows: 
J, 318 [1P1].^[1P2].'+[LM1] Oc7[lK2].5[lE].3[2Ml],7[2E].5pa 74^^11 
[SI] e50c[S2].N67[Tl] el[T2]. In the final class number tho 
punctuation specified earlier replaces the bracketed tags. 

In the classifications of the various fields of knowledge 
the relationships between them are represented by relationship items - 
(or ph ;ir;es ) in vrtiat Rangathan"'-^ calls a phase analysis. This analysis 
is related to Hicklosen*s division of knovjledge into idioglossaries 
(described in chapter 1, section 1, 3. 2, 1,3) based on the assumption 
that seme fields of knowledge nay be considered as compounds of main 
classes, geophysics, for example, beinc; the 'influence of geography 
on physical science •, In the notation of CC the subject is 
represented by the formula, COftU, whore C and U represent the 
respective main olnsscs, physical science and geography, and v;here 
zero indicates a change of phase and tho loi^er-case character g 
represents the phase of influence. Other relationship items are b 
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and k biris^ c and m co-n pari son and d and n diffornnco . Rangathan's 
notation in providinfr a syntax represents a significant departure from 
the nonn of the other general classifications, 

3.2 Special CLi.^sifications 

.2,1 In all the special classifications the categories of the 

type described in section 3.1,2 are adopted, through which distinct 
dictionary entries are formed, each vith its o\m notation so that the 
representation of a dociment is more than an idioglossary summary. 
These classifications may be divided into two main typos. In one, the 
categories are represented by descriptors (or semantic factors as they 
are sometimes called). In this type of approach, ;^ich is called 
coordinate retrieval, the machine does not test for syntagmatic 
relationships in the ansxs^ering of an encoded request for documents, 
but merely for the presence or absence of specified descrdptors. The 
machine, in fact, is often a card-sorter rather than a computer. In 
the second type of classification, provision is made for syntagmatic 
relationships. This typo provides a more reliable base upon ichich to 
sot up a canputerised semantics for mechanical translation and 
inforaation retrieval, 

,2, lei Viiat all coordinate retrieval systems have in common may be 
observed throun:h a matrix, Ledley^s^^ "Tablcdcx" provides one, in 
which the rows represent docitnfients and their reference numbers, and 
the columns, the descriptors. An intersection of a row and a column 
(a postinf^) is assirrncd the dipit, one, if a particular descriptor 
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docs pertain to a certain document^ c^rA a seroj if it docs not, A 
user interested in documents having to do vith the application of 
nuclear theory, for exaiiiplc, would oxa-nine tho colmns of the 
descriptors application > nuclo^r and theory in figure 3> until all the 
rovjs in vjhich tho three postings of the descriptors containing ones 
aro identified. These rows indicate the requisite documents. 

For cotnputational procedure the matrix has to be partitioned 
into entries. Each entry nay consist of cither a descriptor followed 
by references to documents to which it is peirtlnent or of a document 
number followed by the descriptors -which appropriately describe it. 
Again the entry may consist of many descriptors followed by references 
to many documents or vice versa. In the first case the entries are 
of the type •versatility! Powell, Tovc' and 'analysis 8 Pope, 
Stockcndal, Tove*. In a search for docunents having to do with 
versatility analysis, Tove is retrieved by the matching of document 
references. The speed of computation is guaranteed by the alpha- 
betical order of the descriptors. 

In the second case the entries are of the type, 'counting, 
evaluation, versatility: Abrahams' and 'application, concept, design, 
England^ Shiith'. In this type of organisation multiple entries are 
oft.en provided. For example, each of the entiries is listed three 
times in the first example and four times in the second so as to 
bring each descrtptor to the head of an entry. One of the alternative 
entries to that of the first example might be 'evaluation, counting, 
versatility I Abrahams'. V/ith these ;nultiple ^sntrios, the 
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identification of docuTicntr^ oatorod into an author-code d.Lctionaiy is 
aided by systerintic orp:orii:ir,tiont Since every descriptor is nov at 
the head of so.nc entry in ths dicticri^iry, alphabetical order may be 
utilised, althourJi this convenience i^ at the expense of an increase 
in the number of entries. 

An exnnple of the third type of entry is to be found in 

20 

Moocrs* Zatocoding. The representation of the descriptors is 
accomplished by assigning to them certcin numbers, which are notched 
on to the top of a card. Per example, \rhere the descriptors 'selective 
device* (3,11,15,39) and 'film tally* (1^,17,22,30) appear on the same 
entry card, the numbers 3,11,1^,15,22,30, and 39 are notched in this 
order. Below tho notched portion each descriptor is printed vrith its 
document references as shown L* figure ^. Since the machine is 
searching for numbers specificallj** rather than for descriptors and 
document references, the numbers r •o set up carefully. If there had 
been other descriptors •photogray s» (14>11,15,39) and •film ^ ' 

production* (3,17,22,30) both represented by notches on this same 
card, the above notched numbers would be a.mbiguous because many 
descriptors might be retrieved through the same numbers. 

,2.1.2 The functioning of coordinate retrieval methods depends on 
an appeal to practical criteria, namely the limited number of ways in 
which vrords are in practice construed in natural language. Mooers^^ 
says: '*In analysis we make no qttempt to take the message of a 
document and to i/rite a little abstract using descriptor words in 
such a X'jay that tho message of the dopumont is preserved.,.. At the 
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Descriptors Zatocodes 




spl'Jctivc device 3 11 15 39 

'Md t.illy U 17 22 30 

phcto-clcctric sensing 1 11 34 40 

iiudio frcci'jcr.cy code 9 16 29 31 

Q^m ^ .: IS 29 34 

flasn 17 23 34 38 

g 26 33 37 



Reference 
U. S. Patent No. 2,295,000 
Rapid Selector-Calculator 
Kichard S. Norse, Rochester, N. Y. 

one claim 



0 



Figure 4: Fomidt of Kooers ' entry card. 
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symbolic and ccdinfi ''ovel retidcvnl, and net r.es5;age preservation 
must be our goal", irnilc the nv.abor of por?siblo sentences in a 
language is infinite and thcroforo beyond the scope of a finite 
number of descriptors, out of vjhich only a finite nanber of 
Combinations may bo formed, not all sentences are likely to occur in 
dociinent titles # 

Coordinate retrieval systems have been found to work well in 
the encoding -of -diagrar.is, in which the number of different components 
is limited. At the United States Patent Office22 the diagrams of 
chemical structures provided ready-nade descriptors in the form of 
the atoTis in these structures. Those were used in what the Patent 
Office proup called the first topological syster., in which each atom 
was assigned an identification number in terms of x^iiich a request w^is 
formulated. In mechanical procedure^ an atcni to atan match was made 
between the encoded fora of a user's request and each chemical 
structure for wnich there was a document and in the case of 
structural correspondence, the docuiTient reference v;as printed out. 
Since the diagrams Wure able to represent thousands of compounds, the 
first top?lotjical systc i was replaced by a second one, in which the 
number of descriptors w'*s reduced by representing not atcms but 
groups of atoms in a compound set up accoinling to the likelihood of 
their bc?ing requested. 

The cap-icity of coordinate retrieval systems may be extended 
without an increase in the number of descriptors by the judicious 
assiprr.iont of many meanings to each descriptor. Gardin^^ found that 
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in archaeological dmuinr^f; 5?one corrij^-.^v/rits vror.i pz*cdict?.blo from tho 
presence of others. For cxMple, ^ r.:en usu.\lly i,::Hks a owe into a 
container, ohcv'irs it uith a tool ard feeds it '.rith fodder* For these 
situations the nost appropriate descriptors vox»ld appear to bo nmn, 
©Ke, container* l22l> foc^dor^ milk , and fr!?d . Sinco selectional 

restrictions llr/it the number of possiblo coKbinations of such 
descriptors, the principle of econcriy is taken into consideration. 
Gai^iin himself assigns many meanings to descriptors according to 
connotation. VMte represents non-pejorative acts including nilk, 
shcnr and feed, and black pejorative ones like kill and strangle . 

2U 

Brisch, in his building classification, similarly 
utilises many meanings, although they are so apportioned to 
descriptors ("chapeaux** in this scheme) that tho rosolution of them 
requires a builder's intuition ♦ Tne ••chapeau** is a two-digit ntsnbor 
v?ith a bro:id rrngo oi meanings. The first digit represents a main 
class and the second a subclass - according to tho principles of 
Dev;oy*s decima?. notation. The number 60, for example, refers to 
•FunctioHvil compr^nonts. Elenents« Parts of buildings. Installations' 
and 62 refers to * Foundations. Walls. Dnmp-proof courses. 
Partitions. Pillars. Arches. Dressinj^s. Flashings*. This range of 
meanings is narro\red down by foming tho cosibination 62-24-42 from two 
other "chapeaux", 24, •Bitminous materials. Mastics. Lubricants. 
FViels and Gases* and 42, 'Bricks. Blocks. Slabs. Slates. Tiles 
and Shingles'. The matching of meanings by someone with a builder's 
training would reveal that a document referenced by 62-24-42 referred 
to 'a bituminous, damp-proof course in a brick vail' or 'a bituminous 




drocoing on a brick wall*. The setting up of many meanings is 
intricate pnd could make tho updatinn; of a coordin?.tc retrieval system 
more difficult than it he c to be • 

To overcane the obstacle of L'iTitaEmatxc relationships in 

the .sotting of cooi^iinate retrieval systems, stopgap measures have 

been suggested. Taube^5 points out that a document on 'fish as food* 

may be distinguished from one on 'food for fish' by assigning the 

descriptors food and fish to the first document and .food, fish and 

plankton to the second. Rolo indicators'^ too have been advocated to 

discrir.iinate between not only hanonyns of the type base (alkali) and 

base (foundation), but also between the different roles that the same 

tern may play. For example, 'lead as product' is assigned the 

descriptor lead I and 'lead as raw material', lead II. Role 

indicators may function like I^atin endings to represent different 

♦ 

syntactic functions. ^'^ For example, 'A man attacks a lion' would bo 
represented by the throe desciriptors, man I, lion II and attac ks, 
vjhile 'A lion attacks a man' would be assigned the descriptors, 
m an II, lion I and attack s. The distinguicihing of role indicators by 
Ron:an nu^norals ab6vo is strictly mnemonic, a means by vjhich a human 
may keep track of the descriptors. From the computational point of 
view it is sufl'iciont that m?m I is different from man II just as it 
is different from attacks . V.hile role indicators effectively ranove 
ambiguity thoy do so at the expense of an increase in the number of 
descriptors. 

.2.1.3 'O^Q nocoRsity of syntagmatic structure is a fact of not 
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just lanRuago but c\r)y notation Kith tho c^ip^tcity to convoy an 
infinite amount of information through rulcc of grramuar that allow 
the construction of cyntaginc; that are nccocstxrily infinitely long 
fron a finite vocabulary. Circunstantial evidence rnay be four'? 
corollary of Gardin's"^^ classification of ornuinents. In hi. 
given decoration is assigned a sj^bol (called a radical) an* 
operations that it undergoes are denoted by affixes. For example, 
the decoration is named fix and a plurality of then are fixuli or 
uli fix . At this point the radical and affix appear to be unordered 
descriptors. However, fix may denote an operation as well as a 
particular decoration. The ornament^ S^j^is called 'fix uli FIX* 
(•fi>: uli» in the shape of a fix). The repetition of fix , albeit 
capitalised in the second instance, is evidence of syntagmatic 
grouping. 

Vhilo no ambiguity would result from scrambling the 
radicals and affixes in the above example, other examples may be 
found in xAich it would result. One may consider a decoration of the 
type 0 , vMch vail be named circ . V^th this notation the ornament 
tCy^ would be labelled 'fix uli circ» (a group of fix in a circle) and 

0 P 

o^^c 9 *circ uli fix* (a group of circles in the shape of a fix). The 
two omwTicnts are represented by the same elements but in different 
groupings. In the first case fix and uli are immediate constituents, 
TThorcas in the second circ and uli are. Role indicators could be 
utilised to discriminate between the ornaments by means of unoi*dered 
descriptors. With Gardin's^^ capitalisation constituting an 
indicator, the first ornament would be represented by tho string of 
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elan en ts •fix uli CIRC* and the second, by the string •FIX uli circ». 
Tnose elcnents are unorJored but only becaur;9 of an increase by two 
in the vocabulary of elements to Include CIRC and FIX. To bo able to 
represent all possible combinations of symbols and opor.itions by such 
means that might occur in other subjects as well as archaeology, the 
vocabulary would have to contain an infinite nunber of olo^ients. 
Alternatively, a machine might be programmed to abstract the meaning 
of a role indicator in isolation. Such a procedure, however, would 
Involve syntafomatic structure. In 'fix uli CIRC% CIRC is one elanent 
on a higher linguistic level, but two elements on a lower level, 
namely Girc and capitalisation. 

The type of notation in which infomation is conveyed by 
pairs of itcns consisting of a term and a role indicator (FilL*nore»s^° 
case) is a concession to the necessity of conveying syntagmatic 
structure. The tern resGtiblos Gardin^s^-'- radical, and the role 
indicator, the operation it undergoes. An example fran chcsnistry, of 
an experiment in vrfiich hydrochloric acid and marble chips are mixed 
to produce carkon dioxide, may be considered. This sitxiation may be 
represented as follows j (Hj^drochloric acid. Agent 1) (Marble chips. 
Agent 2) (Carbon dioxide, Final product), in vriiich the first 
canponent in each bracket is a torn and the second, a role indicator. 
The pairs (liquid, Property), (solid. Property), and (gaseous. 
Property) might also be pertinent to the situation. The integration 
of them, however, into the rest of the notation would call for more 
linguistic levels as expressed by further parentheses as followsi 
(liquid. Property (Hydrochloric acid. Agent 1)) (solid. Property 
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(Marble chips, An:cnt 2)) ({;aseous, Property (Carbon dioxide, Final 
Product)), Tho parentheses are required, sinco not only can 
individual terms and role indicators bo imme'diate constituents, but 
also pairs of itemst 

In scxne emended versions of coordinate rotirieval of the 
type used at the U.S. Patent Office, linguistic leve'' indicated 
by interfixes rather than by brackets. Prior tc g . on of 

then the descriptors lead , copper , coatinpis and pipc^ |. -taining to a 
request for documents on 'lead coatings for copper pipes* would each 
be assigned the same document number, 100, VSiile a document with 
the above title would be I'otrieved, one on • copper coatings for lead 
pipes* might also be retrieved. Under the interfix system lead and 
coatings would each be assigned the document number lOOA, and copper 
and pipes » the number lOOB. The matching of the interfixes (A or B) 
would shov7 tho links between descriptors so as to discriminate between 
a request for '(load coatings) (copper pipes)' and one for '(lead 
pipes) (copper coatings)*. Since in practice a user is often unable 
to retidcjve \:hat ho vyants, it is desirable to break a request into 
kernel parts, in this example to retrieve documents on 'lead 
coatings', 'copper pipes', 'coatings for copper* and 'coatings for 
pipes'. Since the meaning of for is needed to distinguish 'copper 
coatings' f ran 'coati^ngs for copper', one must provide a more 
complicated set of interfixes than that of the U,S. Patent Office and 
Taube, in the follovdng representation t lead lOOA, coating lOOA Q3 
P3, for 100 Q3 P3> copper lOOB Q3, pipes lOOB P3. The link between 
copper 0 for nnd continn; is shown by the interfix Q3f and the one 
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between £i£os, for and coatlni% by the intvi^fix r3. In each case the 
number dcsincnates the Quantity of ito.n.s liii!:ed» 

Tho interfi>:os do not denote the tfpe of relationship 
betv;cen toms but moroly tho pi*osonce of one. Even \Ath their 
inclusion a document entitled 'The Destruction of Dyostuffs by 
Bacteria* v?ould be cnccded no differently from one entitled •The 
Dostmiction of Bacteria by I^'-estuffs* ^-^'^ ' v;cioh case the numbers 
vould be of tho jpei 200AB Destructiun, 2 )0A i>Acteria, 200B Dye- 
stuff s> v;here 200 is the document reference and A and B are the 
intorfixes. To discriminate between the two titles either of and bjr 
must be included as descriptors or the above descriptors must be 
assigned role indicator;;. 

• 2.2 There are two methods of denoting the tjrpe of relationship 

existing bett;eon any two terms. One consists of fusing together 
analets (a construction consisting of a relationship item sandwiched 
betv7oen two teras). Tno other is a development of the role indicator 
method. An example of the first method is Farradane^s-'^ system, in 
which there are nine operators, as follov;sj 

Non^ti:no« Teniporary Fixed 

relation relation relation 



Conc\:rr ont Concurrence 0 Comparison /* Association /; 

Not dirn.inct Equivalonco Diriensional /+ Appurtenance /( 

I)vr>ti nct Non-cquivnlence Reaction /- Causation ft 

(DijiMnctnoss) ^ * 

Pigure 5 J Farradane's Operators 
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To apply those rolationship oporators cxprc/:£:ions not a/nenabl© to 
thm such as 'IhG cat oats the moujiG* arc nornalicod into oxprossions 
of the typo, •The cat has food* and 'Tno fopd is a itiousoS which aro. 
Compound constructions consisting of many analots are incorporated by 
means of square brackets, the meaning of vrfiich may be obseir/ed by 
examining three sjTionyinous analets. For the docurriont title •Ihe 
production of Glucose by Hydrolysis of Sucrose* the three possible 
analots ax*e as follows? Glucose r/ Sucrose -/ Hydrolysis, Glucose 
1/ [Hydrolj'^sis /-]] Sucrose, and Sucrose [^-/ Hycli o^j sisj/j Glucose, 
An operator immediately outside the brackets does not link with the 
term inside but with the one on the other side of them, These 
brackets provide for variety in the representation of the same title. 

Each of the operators is comprised of tuo ccxnponents. For 
example, (o^ causation, consists of stating what was 

caused and stating what did the causing, and in reaction, 

denotes the reactor and what was reacted upon. In the role 

indicator method the title ^-Jould be represented as follows 1 
((GlucoGOj) (Sucrose,/)) ((Sucrose, -) (Hydrolj'-sis, /)) or, if 
interfixes A and B are used, as followsi (Glucose, s. A) (Sucrose, 
B) (Sucrose, /, A) (Hydrolysis, /, B), From a comparison of the 
two tjT:)Gs of notation the operator is obsei-ved to be a concatenation 
- - . of two role indicators, in which the t ems to v;hich -they -belong 

appear on either side. 

3.2.2.1 The use of an.ilots was favoured by Gardin^^ in SYNTOL 
(Sjmtapmatic Organisation Lmrniaf^o), .which he applied to information 

114 

o 

ERIC 



rotriovai in general. In hit: notation tho ^malots may be expressed in 
terns of one di?non5don as Fam-^dano^s-^''^ vrcro. or in terms of two as 
follov^sj m-— ;.?p, Kiiere m, n, p and q aro .tenns and Hi and U^ 
n 

are relationship items, This arrow fonnat resembles that of 
Schank's"^^ artificial intelligence, except that in SIIvTOL utterances 
are normalised to prcsorvo the analet. An utterance of the type 

•a inhibits the effect of b on c* is not represented as a 

|Pi 
b ■ " ■■ii w^ c 

but as yi which literally means •a has an effect on b with 

/\ 

b 

Ri 

respect to b*s effect on c*. 

There are four main categories of relationships and four of - 

terms. The relationships are of the' follovdng kind, to which examples 

of contexts in vjhich they occur are appended j R^^ predicative 

(•increasing unemployment* )> R2 associative (•cancerous* • • 

organs 0 J consecutive ('the effect of electricity, on muscles* ) 

and R^ coordinative (Hhe differentiation of father and mother 

roles* )• Tho caterrories of terms are as fo?.lowsr Predicates, 

Entities (El or E2), States (S) and Actions (A), They may be obsei^ed 

in the representation of the utterance •degeneroscence des muqueuses 

par un acidc* or more specifically of its paraphrase 'action d*un 

acido sur la muqueuse : degeneresconce* Fg' -" ^ - R3 » ^ E l- ' ■ R2*-^ 

acxde muqueuse 

^ S ^ 

dogeneresconco The category, Action , is assigned to dcftenera-^ion 
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vhen it occurs in pljr::iscs of the type '/i^.*i/.lo5 dc%.onerants' , in which 
it constitutes an intrinsic propcrtj'' of -".^ nc3ri. 

While the four categories of tera?? and relationships nny 
apjvoar to bo too few to prevent ambiguous representation in sane 
circumstances, provision is made for expansion. For the utterances 
•benediction de l*eau* and 'la phobie de I'angoisse' the diagrams 

•benediction — R2— ~NGau^ and •phobi© R2 >angoisse' are inexplicit, 

becr.u 2 . not differentiate the two meanings of do. To make the 
diagrams moi^e precise, expansions are provided as follows: 
•benediction~-R2— -^eau—^-Rl— ^Op instrumental • ( benediction relates 
te vrater vrlth respect to vater beinr; the instrument^) and •phobie—— 

R2 >anEoisse~-'Rl — ^Ope Signe^ (•fear relates to anxiety as anxiety 

is a sign of it*), l-Jith the 'generation of these diagrams, a more 
appropriate name for Rl than predicative would be miscellaneous < 
Viiile the main pai*t of SY^ITOL has to do vdth analets, the expansions 
involve role indicators, of v4iich O^e instrumental and Signe 
introduced by Rl afe examples. Hie necessity for these expansions 
tests the usefulness of the main categories. 

Wnile the arrow diagrams represent the content of a 
discourse in detail, provision is made for an idioglossary t3rpe of 
siirnmary. In Gardin's approach there are tvro main components. One of 
them. Source , provides bibliographic details about a document such as 
its date and the original language in x^zhich it was written. The 
other, Content , lists all the descriptors appropriate to the text of 
a document under seven headings. Scale, TJicme, Focus > Beings, Space , 
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T-inuv and Kcyh^. In ono c.:.'uiple, thai of a docunRMt on »Lc:;icins and 
Behaviour' the abstract v.v.s as fo].J.o:,'r,f 'Ho-JJificr-tion of behaviour 
produced by lesions of the? frontal and to-^pqiv-l lobos. E>:poriments 
on 36 monkeyi: and 10 cr.ls. After temporal le.sio:is» tho an:bials are 
calm, ceaso to distinguich between edible and non-edible things. 
After frontal H.esions thoy are apraxic and timid, a few manifest 
hyi^oractivity*. The ho-din,n:s nnd tlie tor's listcj -indoi' them for 
this abstract are as follows j •Scale physio-psychology. Theme 
telencephalon attitudes, Foous 3, Bnin?a cats, monkeys, Sj^acc , 
Tirne , Mode oxperiwent ». 

3.2.2,2 Ihe role indicator method to v'hich rcfer^^if^e has been made 
in the previous pai-agraphs was applied at Western Reserve University. 
Tlie role indicators (as described in Section 3.2.1.2) are analogous 
to Latin endings of the type a vrhich identifies a noun like j^uolla 
(girl) as the subject of a centence. The analogy holds td.th respect 
to the constraints of linguistic levels (discussed in section 3.2.1.3-)' 
insofar as the endings allow free word order only within a given 
phi-aso or olauso. At V/estern Reserve University, a "telegraphic 
grammar" provided the means of representing various linguistic levels. 
In it, •'^» denoted the berjinning or end of a paragraph, •&» of a 
sentence, »- • of a phrase and of a subphrase. In addition, 

woi'kshoets specified citation orders for tho role indicators .and. their 

tenns. In the encoding of the text of each docunent, neither 
parentheses nor intorfixes vjere used to indicate grouping. 

Tlio telegraphic grrjnmar consists of two parts, one for 
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analytic and tho other for :;yathotic rclationrjhipos In the 
roproccntation of the fon7iox*> the role iv4dicator5 t,t»ke the form of 
infixoj;, \rfiich nro based on an analor^y- vdth Gcruitic lanpuagos. For 
the latter, tho indicators avo in the r>hapo of separate threc-lettof 
ccmbinntions, 

Tho inf3:;ves are added to sanrxntic factors, which dosigrtato 

tho tnoftt frequently applied concepts. A recognised method or 

technique like •X*i*ay diffraction* or 'induction heating* being one 

would bo counted as one multiple-word term to 'lihich a group of 

39 

analytically related factors vrould be assigned. 

How tho factors function may bo observed in the construction 
of a code for the tcms, tcntperinf^ , strong relief and stress 
r clicvin n:.^^^ Tho metallurgist, using his export judgment, would 
provide for these terms tho factors,. M-TL (metal), P-SS (physical and 
chemical operations) and R-UT (procossos and devices directly 
involving heat), in which the symbol specifies that the appro- - 
priate infix is undetermined. The infixes are filjed in by consulting 
a table. Since the metal is something acted upon, the infix 'W' 
(frcTi the table) is inserted to give the complete factor, MV/TL, 
Similarly PASS (in vihich 'A* is derived from tho table) conveys the 
fact that tc^ipj^i^jnj T and gtroRS relief are species of the genus, 
physical and chcnical operations. R(^T denotes that the processes 
involving heat are tho means to an end. The infixes having been 
deteri^iinodj tc^porinp^ o f:trc.s.^*-rolievinr: and stress relief are assigned 
the follo'.dng respective groups of factors (v,^ich are iuiorderGd)f 
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MWL, PASS. RQiiT. 013, MUTL. PASS. RQIT. OlUy NV/TL. PASS miT. 
01^, Tho capitalised factoi-f? dosignato the ic con Lent of the 

tor.Ti5| 'Physical or chODical oporationt; . luc^a"^ \ . ran;; of hcat\ 
'Hie mtnboir; ha^e to do x^ith syj^onywy. k^cn they match, the synonymy 
botwoen two t^i^s is complete, otherwise it is only partial. 

In the second part the telegraphic grarmnar, role 
indicators and terms are paired off, tho former preceding the latter. 
The representation of the utterance •in the analysis of Ni, Co ions 
interfere? 2n ions do not.,.* may be con side red^-*- vhich may be 
confui5ed vrith the one for 'in the analysis of Ni, Zn ions interfere; 
Co ions do not... •, for example, if syntagmatic links are ignored. 
The reprcsent<'^tion of it is as follows: -KEJ (material processed) 
Ni, -K/J-l (process) analysis, -KAM (process) interference, KQJ (by 
means of) ion, KUJ (component) Co, -KXM (process negation) inter- 
ference^ KQJ (by means of) ion, KUJ (canponent) Zn, Tiie dash before 
each role indicator sesnents the representation into the kernel 
components, Nif fingl yrdS f ^interference by means of Co ions* and 
•non-intorfer^nco by Zn ions*. This segmentation is a pragmatic 
measure^ VAiile the presence of more than one role indicator of tho 
same kind might otherwise permit ambiguity within the representation 
as a whole, th^ segments are so constinictcd that in each one every 
indicator differs fran everj' other one. ^ ^ 

For i^etricjval, not only docxmients but also requests for 
documents hnvo to bo encoded. The encoding of a request is an 
exercise in m^ii^r.halling enc3'clopedic .data to specify vrfiat is implicit 
in it. In on^ for •Koforenccs to papers in which tho electron band 
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theory has been applied to 1! f.:t\Tdy of beryllium '^ "^ the explicit 
key torms aro bor yllla-n ( BQr; and 'olccLron band theory'. Further 
terns aro derived throu,-;h a knowlcd:^c of phj^Gics, A physicist would 
recognise that this request concerned energy (N-KG) and grouping 
(G*RP) with respect to the arrangement (R-NG) and location (L-CN) of 
the electrons (P-.PH,6) of beryllium. Accordingly the formula for 
requesting documents 1st BQE. P-.PH.6. ( G-RP. (N-RG-fR-KG) H-L^CIHN- 
RG), The notation works as follows: A,B moans that tho search is 
for docimcnts characterised by the two descriptors A and B in this 
order, (Al-B) that it is for those referenced by A and/or B, and (A.B) 
that only documents characteidsed by both A and B are requested. The 
above notation is refomulated request for docirnents conceming the 
location and/or ener{ry of beryllium electrons or the energy and 
arrangement of groups of berj'-llium electrons. Ihe reformulation 
consists of selecting documents that* partly satisfy a request on the 
hypothesis that the full request cannot always be satisfied. v^Jiilo 
this notation is cccnprehensivo, it was designed for the ©needing of 
discourse by man rather than by machine. 

The special classifications as a whole provide indices 
based on expert opinion by irhich to represent encyclopedic knowledge, 
but do not relate than to lingu? r?tic elancnts for a canplete 
canputational analysis. The Vtestom Reserve- group^? note . that . 
between the words sterol , hi^dcnim:, quenchinnc and hnrdncss tho 
relationships may be expressed in many x/ays in English: •Steel is 
hardened by quenching*, •Quenching hardens steel* and * Quenching 
produces hardness in steel*. Hovjever, no further inquiry is made. 
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Yet tho catoRories aro of itilovost in thr^t thoy ^^^^ the doop 
structui^o of vjhich the thrco sentences are surface ctmctiires. From 
a cursory oxaminAtion it appears that ci;.?y/;ncbdn/i nay bo tagged 
( tec!iniqn c\ hr.)"dr;?i p ( procopr^ t property )t ptool (n^trri'j.al), produces 
( proc f^qs) and hardnef;s ( prot>crty ) to show hov; a notation for a 
discourse is derived from dictionaiy entries. 

From an examination cf the examples given in the special 
classifications, two principles energo. Viiile the V/estom Reserve 
group distinr^ishos analytic from synthetic relationships by format, 
the successful mechanical manipulation of data does not require such 
differentiation. A docunv3nt title of the type 'Solubility of 
Bactericides Containing Mercury' could be represented according to 
its paraphrase 'solubility of (things that destroy bacteria) 
containing mercury' asj (Solubility, properties given) (thing, 
properties given for) (bacteria^ product) (destruction, process) 
(mercury, constituent), v?here the same brackets denote both analytic 
and synthetic relationships. The key to a computerf.sed shanties 
lies in the uniform representation of both linguistic and encyclopedic 
knowledge. To encode 'Russia is making a survey of mineral resources'^ 
ono would not hesitate to adopt the more explicit paraphrase 'Seme 
people in a government organisation in Russia are making a survey of 
mineral resources', . . , . 
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H GRGu:-:nw3:?K for a ca':rjTKriin-:D dictionary 

the clasGifiCcitions elaborated upon in Chapter 3 
pro- ido semantic categories for the normalisation of natural lanrruage 
discourse, the ta£>k of transiting frcxn natural lansuage into code and 
vice versa is left to man rather than m2).chine. The notation for 
. computationally construing: dictionary entries and abstracting 
syntactic contexts step by step is not provided, Ranpathan • s''' syntax 
of rounds and levels has significance in that it attempts to represent 
the liuf-uistic levels according to which natural lanr:ua.q:e is organised. 
Throuch the representation of the notation for his •rice plant* 
example (Chapter 3» section 3.1.2) in the form of a tree, Ranrrathan's 
syntax Is; seen to rese^nble transfonnational frr^iirimar and Tesniere's^ 
work v:hich preceded it. Vvhere the colon classification has five 
different facets, transformational grammar has parts of speech. The. 
rounds and levels correspond to the N (Noun) P (Phrase) and V (Verb) 
P (Fhrapo) markers. However, Ranq:athan*s syntax stops short of 
analysing specific syntactic contexts, v/hich Su^ and Noel^ embark upon 
in recent rcrsearch in computational linguistics, 

^•1.1 Noel attempts to show the iinportance of semantic caterrories 

in a ftrammar, the lack of v;hich has doc!ncd various projects in 
mechanical tran^>lation to failuro. To do so he adapts cases of the 
type advocated by Fillmore^ to make them fit a syntamntic tree, vHiich 
he applies to compare the role of the- con junction and with that of 
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other functdon v;ords. In vxnriy contevtn '\,]iiy, uovd r/,^tr, only a 
connective to linlc utterances of the by]'? 'John cror:^cd the ror.d' and 
•Kary clinjbed a tree' into a l-nrrrer sentence. liVce ^'uary climhcd a tree 
and Jolin corssed the road' or 'John croo.-jcd the rO'-^-d and Kai'y climbed 
a tree' vithout chon.^ring the Meaning; of the discourse. In sane 
contexts, however, what precedes and is regarded as logically prior 
to what follovrs it. The conjunction is then an asyr.etric and,, Tho 
sentence 'The machine vas designed AND ic used to search information' 
contains one and is represented in figure 6, 

The nodes ^ire of two kinds on this syntafrmatic tree. One 
kind indicates the linn:uistic levels of an utterance and is labelled 
by an S followed by a number. At the S nodes a construction is 
desienatod Hmain insofar as it is a sentence, embedded or othen-^ise, 
and -.main insofar as it represents a part of speech dependent on 
somethinr else, a noun phrase, for example. At the top of the tree S 
z;oro represents the highest linc^uistic level, that of the discourse,^ 
Si and S2 designate its constituent structures. Further subdivisions 
of either of these are denoted by Si' and S2', The addition of primes 
to the synbol desif:nates progressively lower linguistic levels. At 
this point the tree resembles inimediate constituent analysis, where 
the descriptive main and its operators, plus and minus, specify tho 
kind of dependency betv:een constituents, 

Tne second kind of node represents the cases and part of 
speech cntc^rorios, verb, object, instrument, cause and froal. Prior to 
boinp; parsed v/ith those, the discours-e is assirned an explicit 



125 



so 



Cause 



SI 




was designed the nachire 



Goal 

Tpurpose 



32 



-main 




Instruinent 



to retrieve the machine 



Object 



infornation 



Figure 6: Noel's syntagmatic tree. 
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paraphrase to hrin^ to Il*:ht its kt;rncl coip-n-nts, Vnile at first 
the sequence of words rit the botton of tho trcv^ in firrurc 6 may appea 
inappropri?:tc, it re^^lly is not ivhcn one con^^idcrs it to he derived 
fran the sentence •So^^cone desicrncd the machin-:?; sooieono retrieved 
infonnation by means of the machine', Hovr tho cases function may be 
observed throu^rh the following utterances? (ri) 'The center stores 
AND retrieves the stored inforniatioa' (b) 'The storage of the 
inforjiation RESULTS in the retrieval of this (stored) information' 
(c) 'BY storing infomation, the center (is able to) retrieve this 
stored information' (d) 'The center stores information IN ORDER TO 
retrieve this stored infomation' . In each of these utterances there 
are two sentences whether embedded or not, Bie first states an 
action upon which the action of the second sentence depends, the 
retrieval of information being dependent upon its being stored. The 
first sentence in Noel's^ terminology is designated as the Caus e and 
the second as the Go^l . 

Differentiation of the above sentences is performed by tho 
cases J Reason and Heans , the species of Cause , and by Purpose and *' 
Result, tho species of Go.^l , v;hich all specify the presence or 
absence of hman intervention. In sentence (a), since there is no 
specification of human intei^ention, the species cases are left 
unmarked. In sentence (b) the effect of storing infomation is 
considered to bo outside human control; the first embedded sentence^ 
then, is desirrn.itod as -Reason, '-Means and the second, -Purpose, 
+Rosult. In (c) the effect of storing information reflects human 
control but not necessarilj^ human desire; the first embedded sentence 
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is correspond Inqly denoted by ih^ cncoryf ^Fxi^r^on^ ^Mc-^ns and tho 
second by Ipurpose nnd tnesull. In sentience (d) tho effect of storing 
is in hunan control ?^nd reflects hu'nan do-sir.c^; the first embedded 
sentence, therefore, is designated 'l"Reason> and tho second, 

+Purposo, -Result. In light of the above snntences asj^trimetric 
conjunction is observed to be a means of leaving the semantic content 
conveyed by the species cases unspecified. This neutralisation of 
specification may be visualised through the tree diagra^is (shown .1 
figure 8) of the sentences 'Hie Center's storage and retrieval of 
information results in the Center's contributing to the development 
of science* snd 'The Center stores and retrieves infomation and 
contributes to the development of science. • 

Vhile Noel*s cases have explanatory value, they need to be 
pared and supplcjiented. Vihere the cases +Reason, +Means, +Purpose, 
H-Result and Instnincnt occur, corresponding categories of the type 
+Hu!nan +;\lsh, +Human +Control, +HaTian -fV/ish, -Control and -Human 
+Control could be substituted to provide more oxplicitness and greater 
economy in the inventory of cases. In order to iniplGnient Noel's case 
syste:Ti the semantic categories of words surrounding and must be 
examined, hhile 'He stored and retrieved infomation* and 'He 
retrieved and stored information* are not interchangeable in Noel's 
contexts, the sentences *He ate and baked the bread* and 'Ho baked 
and ate the bread* are, in the context of a preceding sentence •Some 
people baked the bread and some ate it,* ^Aere the actions of the two 
verbs are considered to happen within one instant of time rather than 
consecutively. 1 Q 1 



VeflT 




Crust? 
-Ucnson 
-Hcnns 



-IHarpose 
4Result 



bl 



-maxn 



S2 



-ma in 




SI ' s:r 

•hnnin +main 




rc'juilt Center -lo^cs inf. Center retr. inf. Center contributes.., 

'The Cc*nter'5i stor.' .md rcLrioval of information results in the 
Center ' p contr i . ' inp,. . . * 

ri^^ure 8n: Noel's synta^mntlc trco53. 
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SI 

-Hnain 



S2 
-fmain 





H- Purpose 
HRcsult 





Center 'Mores ini. Ct^iitcr rctv. inl. Center contributes to... 

'The Center storoii :int] retrieves informer ion and contributes...' 



Figure 8b: Noel's Kynt*nf,inntic trees. 
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'^.1,2 A notriiion for ccniputstion.illv conntruinf!: dictionary 

entries for an un-nbir^uous ropror:cntatio;i of natural lanrruafce 

n 

disccurso is provided by Wilks^^ in his .•5rtif.ici9l intelligence. It 
is couponed of scr:.-?ntic elements, which integrate the functions of 
descriptors and p^rts of speech. These elements form a dictionary 
definition of each of the possible translations ("stereotypes**) of a 
word, thereby constituting part of a "foMula", A forpiula embraces 
both sctnantic eler^ents and a stereotype. For the English word red 
there are two French alternatives roucre and socialiste according to 
l>S.lks and therefore two foraulae as follovrsi 
(((KHERE SPF.SAD) KIICD) (RED (ROUGE))) 

((((WORLD cha:i3S) v:a::t) mm) (red (sociauste))). 

The last two it ens in each formula constitute a stereotype and the 
rest are semantic elcTients, of which the rirht^nost one indicates what 
part of speech to expect, VJiere the- stereotype includes information 
about the target lannrur.^e's semantic and sj'ntactic envirorrnent^ it is 
called a "full stereotype^'. In the En^lish-I'Vench entry for the 
word ad vi.qc (as in 'to advise someone* and •to advise sojieone to do 
something') there are two full stereotypes, in the first r f which the 
meaning of FCJK as opposed to MAN is not apparent^, as follows: 
(ADVISE (CO::SEILLSR A (F::1 FOK-: MAN) 

(CONSEILLEPv (Fn2 ACT STATE STJFF). 

In the text, a granimatical sequence of woxxJs is often 
distinp;uishod fro^ an anonialous one by the fact that it matches one 
of a list of pcnTnissiblc syntnfrms of semantic elements called bare 
tcmplntes. For example, the Rr;inimaticality of the sentence Mohn 
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(MAN) o-.:ns (HAVE) a cf.r (TH!;:-0', in x:[il-h tho capital letters and 

brackets denote r.c^inntlc elc ^ijis, is i-^^istercd hy its matchinn; tho 

bare tcnplate, XA^- HAVE rail.'G. Tlie identification of the template 

enables tho canpuL^-r to supplj- tvro-way de-pendency links for this 

sentence as follov.'sj John<<~>3wns<— ; car. An expansion of the above 

sentence liVce ♦John o\-ns a large (KIND) ear' is represented by the 

addition cf a one-way link to provide tha full template, John -^-^otajs 

^>cnr<<-a. These two tj^pes of link constitute Schank's^ conceptual 
large 

structure. Iho representation of zero indices in a sentence is 
provided by "dummy" (D) semantic elements. For example, in 'John 
(MAM) talked (TELL DKIS)' DTHIS denotes the zero presence of an 
object, and therefore that the verb is intrsnsitive. 

Bare templ^^tes are the criteria not only for establishing 
synta.^natic links but also for resolving ambi^ities (which pertain to 
the individual word) c^nd amphibologies (which have to do with a whole 
construction). Potentially tho sentence •This groen bicycle (THING)' 
is (BE) a winner (MAN? THING?)» may be parsed as IHOG BE MAN or THING 
BE THING, Si-.oe only the second parse matches a template, it is 
identified as the correct one, V/nore more than one template is 
applicable to an utterance, as in the amphibology 'They are eating 
apples,' which the te-nplatos KAli DO THIIJG and THING BE THING fit, 
surroundinr; utterances v/ill have to be searched, in this sentence, to 
identify tho nearest antecedent 9f they . 

The criteria for partitioning a taxt into portions for 
template mntchinn: are punctuation marks, sub junctions, conjunctions 

135 



and prepositions or o key.:ord» One :ruch koyicord is of , which in the 
sentence 'He has a book of mine* wariC:? off r. portion suitable for 
matching the template KAM HAVE THTjJG. V'nerc. n -word functions as a 
keyword only in sane contextr;, an abf?ence of suitable tc^plntes to 
match the portions of a text ijill indicate an incorrect segmentation # 
That the partition of »Hc (MATO gave (DO) up (PDO) his post', in 
^diich up is a keyword, is incorrect is registered by the lack of a 
template corresponding to MAN DO PDO. 

Dependency links between portions of a text are made through 
the kejn^ords and what are called marks. In the sentence *(He came 
home) (fran the war),* in ^-iiich the brackets indicate partitions made 
by computor, the keyword is from and the mark to irfiich it is linked Is 
came. The detection by computer of links between portions is 
essential for the translation of ambiguous prepositions, the resolution 
of which requires the context beyond the prepositional phrase itself • 
For the preposition "out of" there are three altemative '^'.ranslations 
into French, which may be obser^/ed in the follo^'ing sentences! 
(1) 'It was made (l-'^. MARK *D0) out of (FR CASE SOURCE) (de) wood 
(F:a STUFF THI^'G)*, (2) 'He killed (PR KAI^iv *D0) him out of (PR CASE 
SajRCE) (par) hatred (FliZ FEEL)* and (3) 'I live out of (PR CASE LOCA) 
(en dehors de) to\m (FHl POINT SPREAD)*. The capitalised items 
represent semnntic elements that either correspond to the dictionary 
entry of each vjord or are established in the course of analysis and 
PR and LOCA are abbreviations for prepositional and location . The 
full dictionniy entry for 'out of* consists of the following? three 
stereotypes: 
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((PR CASE SOjRCE) (PIWARIC q)0) DE (Til :vTaFF 
((PRCA5E SOURCS) i?K^m ^^jO) PAR (r:;2 FEEL)) 
((iTwA.r.. LOCA) DEHOR.^ I)S (Fifi PCIirp JJPRE/VD)). 

The- cu:>os have to do with the prepo«;ition itnolf. The rest of the 
infonnation in tne stereotype is a statement of the syntactic and 
seiiantic enviroJiiient of tho preposition V/iiich i.s required for a given 
French transl.i xon to be assigned. 

2 Structure of ^ CoxTprchcnslvs CCTputerinod D ictionary 

4.2.1 V^lks'^ mechanical intelligence as an effective means of 

integrating syntactic and setnantic categories provides the foundation 
for a computerised semantics. It may be built upon by the addition 
of Katz and Forior's^^ marker theory and by the idioglossary (notional 
familjO approach to represent linked synta^i^is on the nodes of a tree« 
Wiile the content of a dictionary entry is finite and therefore does 
not require a notational syntax considered necessaiy in chapter 3 for ' 
the encoding of a text, Katj. and Fodor's unoi^iered markers are on a 
precaidous footing. They are parts of dictn^ ary definitions bet^reen 
which the syntagniatic links have been severed. The consequence of 
such severing is pointed out by Vickery He indicates that a 
notation of simple interfixcs would be inadequate to differentiate 
the utterance *tho desti^uction of dyestuffs by bacteria* from 'the 
destruction of bacteria by dyestuffs* by linking dyestuffs and 
bactcr ja to dGsirn otaon , Ulthout indicatinrr the dependency links 
betv;oon these words as shown by of and by; in natural lanf^uaf^e the 
notation can only ro so far as to represent the utterance 'destructive 

137 



relationships boUocn dyor^lvSTc. and hnclcria' . The absence of the 
links marks the sentence as tho genus of tho previous two utterances, 
just the absc.noo of a sy.-ibol, in the following o.iso DO, riorks the 
word f.ni!r^ AN MA tho gonuit of dor;: AN MA DO in Perry's^^ code. A 
limited basis for unoi^iered olanents lies in tho integration into the 
dictionary .;.ructure of idioslossaries^ the use of vrhich does not 
rp ;uire manipulation of the syntax of a text. 

The idloglossary approach capitalises on the fact that a 
word central to the topic oi! a text tends to be used consistently in 
the sajiio sense so that wher^ tho local context of an ambiguous word 
fails to resolve, it, a' likely meaning may be assigned in default. In 
the sentence "Gray and his collaborators concluded that the suckers 
were acting as time signals for each phase of the movement, and are 
nomal but not essential channels for peripheral excitation"-'-^ that 
sucker refers to an anMal*s anatc^y in the context is determined by 
the topic, annelids. As a relatively simple approach, the idio- 
glossary is vzorth incorporating into a semantic tree. Ambiguous 
words occur, hovrever, that cannot be treated by the method, Vihile 
sueker will refer to fish in a text on fish in most local contexts, 
there vri.ll be exceptions, v.'hich may be observed in the following 
quotations J "Sucker, a freshwater fish vri.th thick soft lips that 
form a s\vckorl-j,ke m outh ^ "'^^ and "C atostomidae (suckers) The suckers 
a family of fishes..., i^th the mouth so constructed that it can fom 
a tubolikc siiekf-r ." " Because of the necessity of examining the 
local contexts of vrords the idiorclossaiy approach must be supplemented. 



138 



Despite the faults of Katx and Foj',)r»G tree of markers, the 
structurinr of definitions by moanfi of it to sliow h6v7 a word's 
different meaninfrs aro rclntod l.s ?t ;?tnrtinf;' point for a more 
ocmprehensive tree. Since such n tree is subject to the criticism 
made in chapter 3, section 3*1»2, of all trees > that they cannot 
simultaneously represent all dimensions of knowledpef the searching 
of descriptors for matching pui^DOses will not necessarily be in the 
convenient form envisaged by Katz and Fodor, nairiOly, the branch by 
branch construing of infomation from the top of a tree to the bottom. 
For example, in a sentence in vrfiich the word sucker (figure 11) 
occurs, components that match N-thing-ER in terminal 2 may be 
identified before ones matching BY DO suction. Since the tree 
represents an entry in an Enrrlish-Code dictionary and not a Code- 
English one and thereby corresponds to the author-descriptors entry 
of coordinate retrieval and not the descriptor-authors one of 
coordinate retrieval, it has all the difficulties of orcranisation that 
the former type of entry was claimed to have (chapter 3> section 
3.2.1.1). Since sane fora of organisation is preferable to none, the 
tree vri.ll nonetheless be maintained. 

To construct the comprehensive tree, modifications have been 
made of Katz and Fodor 's tree. The residue of information that they 
put in the fom of distinguishers at the botton of the tree has been 
structured on to the tree itself. The bottom of the tree now contains 
stcreot;^'pe5 or full stereotypes/ which vri.ll be called terminals. One 
mi£?:ht consider retaining the plus and minus indicators and the 
descriptives of the marker tree within the framework of the new tree 
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(Hindus try) 



(+biological {+contest) 
organism) 



(+bioIogic2l. 
or^cnistf;) 



1) 



2) 1) 
Figure 9: Binary brcinching of marker trees. 



2) 



AS sc;no of many semantic olcAontv, Ilovcvorj. the liiiiit^tion of the 
branching at tho nodos to tvro, iinpoccd by tho ir^dicr.torrj, and the 
consoquont inflexibility in the aottinf; up of critogories i;ould 
obctinct the purpose of the tree. Partial tx*oos for tho ifoixls plAnt 
and ;>hoot may bo considorcd. The binary branching into plus and 
minus categories in diiignun 9 is consir^tont i>^Lth v?hat Katz and 
Fodor envisaged, but does not provide functioning trees. Matching of 
markers in tho sentence 'The plant's shoots oxp.?nded* does not reveal 
the contextual terminals of plan t and shoot s because both tho marlcer 
(•fliving) and (-living) of each word finds a counterpart in the other. 
Die defect lies in tho indiscriminate application of the minus 
operator. The marker (-living) viiile syxniotrical with (+living) on 
tho alternative branch is vague and needs to be divided into more 
specific ones to prevent tenninal 1 of ot and pl/,^.nt from being 
selected. 

Mith emendations, Katz and Fodor tree resembles a 
conventional dictionary, in i?hich the nuribering of tho definitions 
constitutes an organisation of thein into scriiantic categories. The 
setting up of a tree for tho \ror:l suckor as defined in Harrap^s 
French dictionary^^ reveals that the definition numbers are in fact 
encoded markers. As a bi-- product of trees constructed for computa- 
tion, a dictionary vjould aiiorge in which there would be sufficient 
control of tho wording of the definitions for relationships between 
words as well as Ltoanings to be 'displayed. 

^.2.2 To demonstrate tho operation of the cnondcd version of Katz 
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and Fodor^s treo, whlcli inclutU;:; V/ilkri'*'-''^ context npecifyinpr 
doijcriptors, the fmhxr.uons wordt: n uckor r.nd b.onto hav(^ been selected. 
Comparable vrords are er r-ropo^ f IV.'ILilElI fiil'^lllZ.* Semantic elements 
i^erc derived for the ^ibovo two v:oik)s by o^Rr.iAning actual usa^e in 
texts and encyclopedins. In addition, Webct©r*s Seventh Collegiate, 
dictionary vjas combed for words v<;uch as svnn in 'The sucker swijns* 
that are por.nitted by selectional restrictions to fom a direct 
syntactic link with suokcr. Those words were sorted into semantic 
categories. The two sources of categories wore then collated. In 
the setting up of them the requirements for disambiguation were found 
to diver6:e from those for other purposes of computation. To meet the 
former, the semantic element human was sufficient to identify the 
terminals of sucker having to do vrith moral wong. To make the tree 
I an all-purpose one, all information vxas encoded. 

In the computational analysis of the local contexts of 

G.'^ibi^uous words, the simplest operation consists of testing immediate 

constituents such as an adjective and a noun, a subject and a verb 

and a verb ^nd its object and has been seized upon by computational 

linguists. Booth, Brandwood and Cleave' s^^ (Chapter 1, section 

19 

1«3«2,1,1) and Mastcrman*s (Chapter 1, section 1.2,2.2) concept 
numbers are essentially a list of interfixes stating which pairs of 
words r^ay become immediate constituents id.thout forming ananalous 
constructions. Tne interfixes are like Katz and Fodor*s markers but 
are not factored into semantic categories. T\\q concept number 
teclmiquo is plausible in that the nmiber of interfixes, though large, 
will be finite since only pairs of words are linked, but it has 
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Figure II: MocLJ. tree fot» Sucker 
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limitations. In tho sentence •! sec s^uckors*, for oMample, see does 
not disambim^xto suckor. Therefore, the limitation of the trees for 
fivrVor Txnd b^^nto to application uithin a sinrle sentence is but a 
short-tern expedient based on the insuperable difficulty of encoding 
all encyclopedic knowledge. 

To orccinise scric?.ntic categories in terns of syntactic 
structure, semantic elen^nents, including v:hat will be called linguisti 
descriptors, have been mapped on to a tree to show i^hat the 
definitions of different tei^nin.ils have in common, without destroying 
tho syntaF^matic links betvjeen the components of each definition. For 
example, v;here one tenninal cf the word sucker is defined as 'a young 
unvieaned v:hjile' and another as *a young unvjeaned tame pig*, the two 
definitions are si^Tithcsised as follows j youaig unweaned N-animal 
BE^-^"^^® Kl^^hc^le '^'5iich the first four components are represented 
on the upper part of the tree in figure 11, 

The linguistic descriptors are a refinement of the 
traditional parts of speech for parsin?;, and thereby cover the area 
to v.^hich 13ar-KilIol^^ applied his categorial grammar. In the tree 
diar;ram for ^^c^or, that follows, linguistic descriptors are 
designated b^^ capital letters. Their functioning may be observed in 
the words c-nTVIoyor^ en ploy c'lnd employee ^ from the suffixes of vihich 
some of the de.scriptors have been dorived. In the usual paraphrases 
of then different vrords ^jould be. used in each case, as follovjsj 
•one v7ho Crnploys someone e.wloy and *ono who is employed by 
someone* , Insofar as a word's s^nantic t.nd sjTitactic environment 
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Fx^uro 12: Model tree for "bantc" 
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contributes to its neiining just nj-: much as rcr^ parciphrace does, the 
throe \:or6-: nerit the name notation •[I-huTx-.n^SR DO-anploy N-hman- 
EES in which ncr.nr. noun, ' ^KV subject j object and •D0-» 

verb. Insofar as the iv^ords belonr; to different parts of speech they 
are underlined differently in their definitions, as followsi 
* -J-huT.an~E '''i DO.cmploy N-human-ES* , •Iv-human-Ea DO-gTiplny K-human-EE* 
and •N-hu.^ian^ER DO-employ N^h^nn^EE ', The uiiderlined portion is the 
nucleus of a word's definition and the rest is its enviroa^nent , a 
means of crossreferencing it to the other tvo vrords. 

In order to incorporate the idioglossary technique the tree 
diaf^ram includes the linguistic descriptor, IK, It is through it 
that the hierarchy of idicglossaries elaborated upon by Micklesen^^ 
(Ch?.pter 1, section l,3t2.1.3) is represented* In the tree, hypony^y 
relationships, vrhether in the environnicnt of IN or of some other 
linguistic descriptor, are denoted by BE: or iBE. The element 
nearest the colon is the species and the one furthest away is the 
genus. V/nethcr BEi is used or jBE depends on how the tree branches 
and therefore, vrfiere the semantic elenents are placed. ^ 

^#2,3 By means of the trees for the tv7o ambiguous words sucke r 

(figure 11) and br.ste (figure 12), it is possible to disambiguate one 
word by means of the other. In the analysis of the sentence 'He 
basted the sucker before eating it* the environment of DO sew at 
te?riinal 1 of bnnto is matched i/ith each nucleus of sucker for 
elements corrcspondinp: to 'Ions loose stitches'. Since there are 
none-, terminal 2 is inspected and the. common nucleus of terminals 11, 
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12 and 13 of yiicV.rtr , H-hu-rir-n , is found to neet the i-equiresnent of an 
object of baste (2) , Tenr.in.'^Ll 3 of h^jj^ is si^nilnrly tested and the 
common nucleus of ten^iinals 7> 8, 9 and 10 af suckc^i- is found to 
qualify as an object of b.^ntc (3) . To decide between terminals 2 and 
3 of b^^te , the phrase 'before eating it' is exa-^ined, in which 1^ is 
traced to sucker . The word it in its context narrov7S down the 
nucleus of sucker to N-anlmal , and therefore to terminals 7, 8, 9 and 
10, As the object of baste belongs to the categorj*-, animal s terminal 
3 of this word is selected as its contextual meaningg 

Further disajiibiguation of sucker must wait for a wider 
context than the single sentence given and a more comprehensive tree 
based upon a lot of empirical evidence. Nonetheless, the one given 
illustrates the interaction of semantic and syntactic categories, 
Ihe linguistic descriptors allow syntactic clues to pinpoint 
teminals. In the sentence 'The sucker basted the meat', the fact 
^^^^ s^^*<^o^ is the subject enables it to bo categorised as hman to 
limit the applicable tominals to 11, 12 and 13 • 

Since the definitions of teminals encoded on a tree are 
represented in deep structure, sentences may have to undergo language 
noraalisation before the tree is usable. In order to pinpoint 
terninal 2 as the most probable meaning of mucker in 'The sucker drew 
the water up', this sentence must be assigned the paraphrase 
•somcthinp: (M-thinr;-ER) caused (DO CAUSE) the water (N-thing-EE BE 
N-thinr.-ER) to riso (DO) by (BY) the suction (DO) of the sucker 
(N-thing-lCR)'# Uhile the sentence dops not perfectly match the 
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encoded dofinition for terminal 2, :;^thin;'^r:^ DO CAUSE j;-thinf:^EE BE 
K-thinpr-EPw DO imbibo BY DO suction, it natcliori this one more closely 
than the ones for the other tcr.ainals of svr-!cer. 

The matchinf?: of a teminr.l«s definition i-dth a sentence 
shows how the contextual meaninj of a vjord is derived not only fran a 
selection of its diction.-^ry alternatives, but also fran the contextual 
meaning of another word or words. For the sentence 'Ke fooled the 
sucker with the stuffed tiger', the matching process shows how the 
stuffed ti^cr cones to be viewed as a lure. The stuffed tiger is so 
considered because of its link vjith a component of the definition of 
sucke r (].l) in the following mappsd-out version of the above sentence^ 
•He (r;-huTian-ER) fooled (DO moral wrong) the Gucker (H-human-EE) with 
(BY) the stuffed ticker (Lure)'. 

Farther developnents on the tree presented so far may 
iiidiide the incorporation of language normalisation progra.'^imes, A 
tree accordirv^-'-ly equipped would be capable of taking into account the 
facts covered by FiH-aore's"^ cases, by matching the elcnents of an 
utterance vrith a dictionary definition v;hich in turn would provide 
an explicit paraphrase of the utterance. An analysis of his sentence 
•A man (A^-hiv-^ian-ER) moved (DO move) the rock (N-thing-EE)' 'The ^-jind 
(K-thing-ER) moved (DO move) the rock (K-thing-EE) • and 'The rock 
(M«thing-ER) movnd (DOmove)', mentioned in Chapter 2, section 2,2.1, 
would centre on the word moved. . That it does not have the same 
function throu^';hout beconos apparent in the follovring respective 
paraphrases of the above sentences^ 'A man caused the rock to move', 
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•llie v;ind caused the rock to move* :i::a *'h\e rcclc moved \ In order to 
show ho:^ to arrive c-it the$:e p?trnphr.'-5;r:> COTputaticnally, tho tree in 
figure 13 is provided. To detemino vhich br,^neh is applicable to th 
contextual function of n;.ovc, the matehinR; procedure described in 
previous pararrr^phs is used. By this neans the surface structure 
notation for 'A man moved the rock* ic recognised as •N*hu-ian-ER DO 
move N-.thing-ES BE: rock*. The deep structure representation is 
arrived at by replacing non«bracketed elements by bracketed ones that 
follow an equals sign. For the above sentence it isj N-h\man-ER 
(Means) CAUSS Nothing- ER DO Move , The viability of the above 
procedure would depend upon ho\7 complex the relationship between a 
given surface and deep structure was. 

Often the key to dica^nbifaiation lies in a scries of 
syntactic links. The follovinr; extract^^ may be "Considered! "The 

vjoricing of a Nevjcancn engine is a very painful process.,,,, Wnen 

the puTip descends, there is heard a plunge, a heavy sigh and a loud 
bump: then as it rises, and the sucker begins to act, there is heard 
a creak, a wheeze,.,.,.". In this sample tho disambiguation cf 
sucker by means of its link \Ath pump relies on tracing the link 
through It. Since indirect linking of this type may take many forms, 
reliance \7ill have to be placed upon a documentary language of the 
kind developed at Stanfoi-^ or one based upon Gardin's^^ SYWTOL to 
reveal explicitly the linking betvzeen elements in a text. 

Such a docmontary language would be applicable to K^tz and 
Foaor's sentence (chapter 1, section 1.3*2.2) •! shot the man with 



ENTRY= "move" 




Figure 13: Deep and surface structure tree for "move" 
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EOT= 'I shot the man with a gun' 



K-I-ER DO shot 
N-hurnan-EE 



H/\Vt gun- 



N-I-ER 




Em= 'if \b had had a gun too 



K-hunan-ER 



not HAVE gun 



N-hun'.an-ER 



Figure W: Computationally constructed tree 
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a Run, but if he had had a pin too, he vould have shot me first A 
likely outccrie of its use would probr.bly be the conponents on trees 
in figure 1^. In these trees the non-matehS^nr; of •Iwhuman-ER HAVE 
gun* with •N-hunan-ER not HAVE cun» v:ould be the eriterion for 
eliminating terminal 2 as the appropriate elcpient to v^iieh to link 
with. To arrive at these trees, some sophistieated form of language 
normalisation \:ould be needed. This type of amphibology, then, is 
the point at whieh the tree's usefulness eeases and at whieh a 
gra^Timar takes ox'^er. 

The tree diagram suggested for the words "sueker" and 
•*baste'* is not meant to be the last word even in the resolution of 
embigAxity alone. As the representation of the vast body of encyelo- 
pcdie knov7ledge to '^jhieh all words have referenee vrould be an inmense 
undertaking, this thesis h?.s neeessarily been eonfined to formulating 
the questions that need to be asked rather than finding answers. For 
a ecnputer pron-rr^fmme the trees might be designated by one-dimensional 
braeket fomulae of the type advoeated in ehapter 2, However, no 
attempt h?.s been made to provide algorithms for the-m in this thesis, 
beeaune the foirnulation of the information eontained in the trees has 
not boon based upon suffieient cmpirieal evidcnee for the results of 
eonputation to dceide its validity. The verifieation, therefore, oi* 
the proeedure for eonstrueting a eomputorised dietionary must av/ait 
further study, 
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1, 




2, 


Tcsnierc 1959 


3. 


Su 1971 


4. 


•Joel 1972 


5, 


Fillmore 1969 


6. 


Noel 19*22 


7. 


;\llks 1971 


8. 


Schank I969 


9. 


l^llks 1971 


10. 


Katz find Fodor I963 


11, 


Vickcry 19^5 


1?. 


PCTTV 19*56 




Dnle^^ 1963 


14. 


Collior * s Encyclor>t3dia 


15. 






Hrrran 1Q62 


1?. 


V.llks 1971 


18. 


Sooth, Br^.ndvjood and Cloave 1953 


19. 


K^5tcraan 1956 


20. 


Bar-HillGl 1953 


21 




22. 


FilLnoro 19^>9 


23. 


E-iiilcG 1862 


2/K 


Gardin 19^5 


25- 


Kntz and Fodor 19^3 
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